cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

sp314
Adept II

How do I read and interpret the contents of GPR_ALLOC?

I wrote some code that uses 32 VGPRs in GCN asm, and ran it on Hawaii/R290x through OpenCL using clCreateProgramWithBinary()/clBuildProgram()/clCreateKernel() with global work size = 64 (threads per wavefront) * 44 (compute units on R290x) * 8 (waves per CU) = 22,528 and local work size = 64. This corresponds to one OpenCL workgroup per wave, and using 32 VGPRs and not exceeding the other limits, such as SGPR count and LDS alloc, the entire graphics card should be able to accommodate all of these threads at once.

Following the Sea Islands ISA document found at http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf, table 5.9 on page 48, I've tried querying the base VGPR and the number of VGRPs assigned to each wavefront like so

    s_getreg_b32     s0, HWREG(GPR_ALLOC, 0, 6)     /* vgpr base ofs, ofs = 4*result ? */

    s_nop 0

    s_nop 0

    store s0 to vgprBaseOutputArray[global_thread_id]

and

    s_getreg_b32     s0, HWREG(GPR_ALLOC, 8, 6)     /* vgpr size, num VGPRs = 4*(size+1) ? */

    s_nop 0

    s_nop 0

    store s0 to vgprSizeOutputArray[global_thread_id]   

The ISA doc says that when reading VGPR_BASE, I should be getting the 'Physical address of first VGPR assigned to this wavefront, as [7:2]', and I should be getting the 'Number of VGPRs assigned to this wavefront, as [7:2], 0=4 VGPRs, 1=8 VGPRs, etc.' when reading VGPR_SIZE.

I'm getting 7 for all threads when reading VGPR_SIZE, which corresponds to the correct number of 32 VGPRs. That is fine. However, I'm getting VGPR base set to 0 for threads with global_id = 0..11263 (exactly the first half of 64*44*8 = 22,528 total threads) and VGPR base set to 8 (the value returned by s_getreg_b32) * 4 (due to the [7:2] format) = 32 for threads with global_id = 11,264..22,527 (the second half, exactly).

To my understanding of the ISA doc, the first 64 threads/wave 0 should be reporting VGPR base set to 0, the next 64 threads/wave 1 set to 32 (after multiplying s_getreg_b32 outpuit by 4), etc. Yet, I'm getting 0 for a whole bunch of threads, and 32 for a whole bunch of other threads.

I feel like I'm making a rookie mistake. What am I missing or doing wrong, and what did I not understand correctly?

If it matters, I'm on Windows 10 Home 64 bit with 17.10.2 drivers, and I'm using CL Radeon Extender as the assembler. Note that there are two assembler directives, .pgmsrc1 and .pgmsrc2, that I didn't specify in the source code (I've specified .sgprsnum 11 and .vgprsnum 32 though), and when I disassemble the compiled kernel, I get

        .pgmrsrc1 0x00ac0047

        .pgmrsrc2 0x00000090

in the output. Perhaps, these are necessary, and they somehow make the register assignments clash between the waves, or something? How do these work and are they documented anywhere?

Thank you for your help in advance!

0 Likes
1 Solution
matszpk
Adept III

The 'pgmrsrc1' and 'pgmrsrc2' are special registers set up before starting kernel by driver. They are described in https://www.x.org/docs/AMD/old/SI_3D_registers.pdf as COMP:COMPUTE_PGM_RSRC1 and COMP:COMPUTE_PGM_RSRC2. Bits 0-5 in PGMRSRC1 are number of used VGPRs by kernel (0 - 4, 1 - 8, 2 - 12,...), bits 6-9 are number unsed SGPRs by kernel (0 - 8, 1 - 16, 2 -  24). These registers are also described in this site: ROCm-ComputeABI-Doc/AMDGPU-ABI.md at master · ROCm-Developer-Tools/ROCm-ComputeABI-Doc · GitHub

'.pgmrsrc1' and '.pgmrsrc2' in assembler source sets value of these register, however assembler same set up value of these registers from other pseudo-ops. These pseud-ops was added to correct value of these registers, but in many cases it is obsolete.

View solution in original post

2 Replies
matszpk
Adept III

The 'pgmrsrc1' and 'pgmrsrc2' are special registers set up before starting kernel by driver. They are described in https://www.x.org/docs/AMD/old/SI_3D_registers.pdf as COMP:COMPUTE_PGM_RSRC1 and COMP:COMPUTE_PGM_RSRC2. Bits 0-5 in PGMRSRC1 are number of used VGPRs by kernel (0 - 4, 1 - 8, 2 - 12,...), bits 6-9 are number unsed SGPRs by kernel (0 - 8, 1 - 16, 2 -  24). These registers are also described in this site: ROCm-ComputeABI-Doc/AMDGPU-ABI.md at master · ROCm-Developer-Tools/ROCm-ComputeABI-Doc · GitHub

'.pgmrsrc1' and '.pgmrsrc2' in assembler source sets value of these register, however assembler same set up value of these registers from other pseudo-ops. These pseud-ops was added to correct value of these registers, but in many cases it is obsolete.

Hi matszpk,

very belated thanks. I don't know how I missed these docs, but they're just what I needed. Everything works now, thanks!

Best,

sp314

0 Likes