3 Replies Latest reply on Jun 25, 2015 6:16 AM by dipak

    __local variable inconsistency issue


      Hi everyone,


      I'm having issues when using __local variables in the attached kernel. When run in a loop with constant input data, the output differs at semi-random locations. On a Tahiti GPU, data differs starting in a random iteration in a random location. On a Tonga GPU, data differs in the second iteration at a fixed location. In both cases, the data inconsistency starts at memory addresses written by local_id(1) >= 64. For my use case, I'd expect the contents of 'sums' and 'textures' to be the same in each iteration.


      Here is the relevant input data:


      IMAGE_WIDTH is defined to be 680

      IMAGE_HEIGHT is defined to be 512

      NUM_DISP is defined to be 112

      WINDOW_SIZE is defined to be 5

      Work size is (IMAGE_HEIGHT, NUM_DISP ), work group size is (1, NUM_DISP).

      left, right, sums, textures, and prefilterCap are identical for each kernel run.

      Both GPUs use the latest non-beta Catalyst drivers.


      The inconsistency disappears for NUM_DISP <= 64, and when the kernel is running on a CPU device. Did I miss a barrier call somewhere? As far as I can see, all work items should hit every barrier, and only use the local variable's contents after the barrier.