cuda - How does the ABI defines the number of registers in GPU? -


there line in cuda compiler driver nvcc - options steering gpu code generation ambiguous me:

value less minimum registers required abi bumped compiler abi minimum limit.

does abi have standard or limitations number of registers __global__ , __device__ functions use?

i think (can't find reference right now) cuda abi requires @ least 16 registers. if specificy lower register count (e.g. -maxrregcount) compiler bump specified limit minimum required abi, , print advisory message stating did so. maximum number of 32-bit registers available per thread, gpu architecture dependent: 124 registers sm_1x, 63 registers sm_2x, , 254 registers sm_3x.

generally speaking, abi (application binary interface) architecture specific convention storage layout, passing of arguments functions, passing of function results caller etc.. abis (including x86_64, arm) designate specific registers specific tasks such stack pointer, function return value, function arguments etc. since gpu architecture allows variable number of registers per thread, use of abi requires minimal number of registers present fill these defined roles. if recall correctly, cuda introduced abi version 3.0, first version support fermi-class gpus.

the abi requires compute capability 2.0 or higher. older gpu architecture lacked hardware features required abi. of newer cuda features, such device-side printf() , malloc(), called functions, separate compilation, etc rely on , require use of abi, , used default in compiler generated code sm_20 , above. can disable use of abi -xptxas -abi=no. advise against doing that.


Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -