__local variable inconsistency issue

Question asked by lerlinghagen on Feb 15, 2015
Latest reply on Jun 25, 2015 by dipak

Hi everyone,


I'm having issues when using __local variables in the attached kernel. When run in a loop with constant input data, the output differs at semi-random locations. On a Tahiti GPU, data differs starting in a random iteration in a random location. On a Tonga GPU, data differs in the second iteration at a fixed location. In both cases, the data inconsistency starts at memory addresses written by local_id(1) >= 64. For my use case, I'd expect the contents of 'sums' and 'textures' to be the same in each iteration.


Here is the relevant input data:


IMAGE_WIDTH is defined to be 680

IMAGE_HEIGHT is defined to be 512

NUM_DISP is defined to be 112

WINDOW_SIZE is defined to be 5

Work size is (IMAGE_HEIGHT, NUM_DISP ), work group size is (1, NUM_DISP).

left, right, sums, textures, and prefilterCap are identical for each kernel run.

Both GPUs use the latest non-beta Catalyst drivers.


The inconsistency disappears for NUM_DISP <= 64, and when the kernel is running on a CPU device. Did I miss a barrier call somewhere? As far as I can see, all work items should hit every barrier, and only use the local variable's contents after the barrier.