cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ryta1203
Journeyman III

Getting thread index from cs kernel

I'm trying to get the thread index from a cs kernel and put it into that index's address in the global buffer as follows:

Any ideas how to do this properly?

il_cs_2_0 dcl_num_thread_per_group 64 dcl_cb cb0[1] dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float) dcl_resource_id(1)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float) dcl_resource_id(2)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float) mov g[vaTid.x], vAbsTidFlat.x ret_dyn end

0 Likes
6 Replies
lipi
Journeyman III

I'm doing something similar but I move the thread ID to a register first and use that register to index into the global buffer. I don't remember if it was necessary to move Tid to a register or I did it only because I actually have to multiply the Tid to get the proper global buffer offset.

 

0 Likes

Well, that's not working for me either. The following code only produces zero. I'm pretty stumped.

const char * HILKernel = "il_cs_2_0\n" "dcl_num_thread_per_group 64\n" "mov r7.z, vaTid.x\n" "mov g[vaTid.x], r7.zzzz\n" "ret_dyn\n" "end\n"; Also, this doesn't work: const char * HILKernel = "il_cs_2_0\n" "dcl_num_thread_per_group 64\n" "mov r7.z, vaTid.x\n" "mov g[r7.z], r7.zzzz\n" "ret_dyn\n" "end\n";

0 Likes

Originally posted by: lipi I'm doing something similar but I move the thread ID to a register first and use that register to index into the global buffer. I don't remember if it was necessary to move Tid to a register or I did it only because I actually have to multiply the Tid to get the proper global buffer offset.

 

Can you post your kernel or at least the important parts... I have no idea why I can't simply get the thread ID and put it into the global buffer.

0 Likes

So this seems to work in case anyone is interested. I'd be interested  hear from AMD why this is the case.

const char * HILKernel = "il_cs_2_0\n" "dcl_num_thread_per_group 64\n" "itof r7.z, vAbsTidFlat.x\n" "mov g[vAbsTidFlat.x], r7.zzzz\n" "ret_dyn\n" "end\n";

0 Likes

I'm using vTid currently but I remember using vaTid at one point and it worked. The relevant parts of my code are in the attached code.

Have you checked the disassembly to see if something got optimized away? I had to add fences to keep global memory access within the loops.

 

 

il_cs_2_0 dcl_num_thread_per_blk 64 ;;; r1 -- per-thread constants ;;; ;;; x: thread ID ;;; y: 2 x thread ID mov r1.x, vTid0.x ishl r1.y, r1.x, l0.x whileloop whileloop fence_memory ; keep RD_SCATTER in the loop mov r2, g[r1.y] ; RD_SCATTER ;; code omitted break_logicalz r2.w endloop ;; code omitted mov g[r1.y].y, r2.y iadd r4.x, r1.y, l0.x mov g[r4.x], r3 fence_memory endloop end

0 Likes

lipi,

  Thanks for your replies, I pretty much got it working (at least for 64x1 block size). I think I just forgot that the output needed to be a float and that the addressing  needed to be ints.

0 Likes