4 Replies Latest reply on Aug 16, 2009 6:11 AM by empty_knapsack

    Possible IL bug

    lipi
      vObjIndex0 in scatter_IL

       

      The following IL code from the scatter_IL example seems to write only half of the global buffer locations:

      const CALchar* ILKernel =
      "il_ps_2_0\n"
      "dcl_input vObjIndex0\n" // vObjIndex starts at 0 and increments by 1.
      "mov g[vObjIndex0.x], vObjIndex0.x\n"
      "end\n";

      When using global buffer preinitialized with 0xffffffff the output will be:

      00000000 00000000 00000000 00000000
      00000001 00000001 00000001 00000001
      ffffffff ffffffff ffffffff ffffffff
      ffffffff ffffffff ffffffff ffffffff
      00000004 00000004 00000004 00000004
      00000005 00000005 00000005 00000005
      ffffffff ffffffff ffffffff ffffffff
      ffffffff ffffffff ffffffff ffffffff

       

        • Possible IL bug
          MicahVillmow
          Due to how the hardware works with Pixel Shader & vObjIndex, you will see this behavior if you do not have a multiple of 2 height in your buffer.
            • Possible IL bug
              lipi

               

              Thanks Micah, I had a height of 1 indeed.

              I changed the kernel to use compute shader and thread ID instead of object index, now it works correctly.

              • Possible IL bug
                empty_knapsack

                >you will see this behavior if you do not have a multiple of 2 height in your buffer.

                 

                As I can understand it isn't buffer size must be multiple of 2 but domain invocation size, am I right?

                I.e., if I'm running Grid as:

                 CALdomain domain = {0, 0, 4096, 4096};
                 if (calCtxRunProgram(&e, ctx, func, &domain) != CAL_RESULT_OK) { ... }

                will vObjIndex0 be valid for all values from 0 to 4096^2 - 1 no matter how buffers was declared?

                 

                It's really easier to go compute shader mode and use vaTid0 instead of vObjIndex0... unfortunately there are other problems with cs, so I forced to return to ps.

                 

                  • Possible IL bug
                    empty_knapsack

                    As we have absolutely terrific support from AMD/ATI at this forum I'll answer by myself:

                     

                    No, vObjIndex0 won't be valid. With domain size bigger than 512 elements vObjIndex0 behaviour becomes totally unpredictable, any element can be skipped not just odd/even ones. If you aren't checking whole buffer you can easily miss this error. And so no point to use vObjIndex0 at all.

                     

                    And no, it's not corrupted memory on GPU as switching to compute shader mode and using vaTid0 gives totally correct results.