6 Replies Latest reply on Sep 16, 2009 11:45 PM by ryta1203

    Getting thread index from cs kernel

    ryta1203

      I'm trying to get the thread index from a cs kernel and put it into that index's address in the global buffer as follows:

      Any ideas how to do this properly?

      il_cs_2_0 dcl_num_thread_per_group 64 dcl_cb cb0[1] dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float) dcl_resource_id(1)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float) dcl_resource_id(2)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float) mov g[vaTid.x], vAbsTidFlat.x ret_dyn end

        • Getting thread index from cs kernel
          lipi

          I'm doing something similar but I move the thread ID to a register first and use that register to index into the global buffer. I don't remember if it was necessary to move Tid to a register or I did it only because I actually have to multiply the Tid to get the proper global buffer offset.

           

            • Getting thread index from cs kernel
              ryta1203

              Well, that's not working for me either. The following code only produces zero. I'm pretty stumped.

              const char * HILKernel = "il_cs_2_0\n" "dcl_num_thread_per_group 64\n" "mov r7.z, vaTid.x\n" "mov g[vaTid.x], r7.zzzz\n" "ret_dyn\n" "end\n"; Also, this doesn't work: const char * HILKernel = "il_cs_2_0\n" "dcl_num_thread_per_group 64\n" "mov r7.z, vaTid.x\n" "mov g[r7.z], r7.zzzz\n" "ret_dyn\n" "end\n";

              • Getting thread index from cs kernel
                ryta1203

                 

                Originally posted by: lipi I'm doing something similar but I move the thread ID to a register first and use that register to index into the global buffer. I don't remember if it was necessary to move Tid to a register or I did it only because I actually have to multiply the Tid to get the proper global buffer offset.

                 

                Can you post your kernel or at least the important parts... I have no idea why I can't simply get the thread ID and put it into the global buffer.

                  • Getting thread index from cs kernel
                    ryta1203

                    So this seems to work in case anyone is interested. I'd be interested  hear from AMD why this is the case.

                    const char * HILKernel = "il_cs_2_0\n" "dcl_num_thread_per_group 64\n" "itof r7.z, vAbsTidFlat.x\n" "mov g[vAbsTidFlat.x], r7.zzzz\n" "ret_dyn\n" "end\n";

                      • Getting thread index from cs kernel
                        lipi

                        I'm using vTid currently but I remember using vaTid at one point and it worked. The relevant parts of my code are in the attached code.

                        Have you checked the disassembly to see if something got optimized away? I had to add fences to keep global memory access within the loops.

                         

                         

                        il_cs_2_0 dcl_num_thread_per_blk 64 ;;; r1 -- per-thread constants ;;; ;;; x: thread ID ;;; y: 2 x thread ID mov r1.x, vTid0.x ishl r1.y, r1.x, l0.x whileloop whileloop fence_memory ; keep RD_SCATTER in the loop mov r2, g[r1.y] ; RD_SCATTER ;; code omitted break_logicalz r2.w endloop ;; code omitted mov g[r1.y].y, r2.y iadd r4.x, r1.y, l0.x mov g[r4.x], r3 fence_memory endloop end