memory corruption addressing global past 1442560?

Discussion created by emuller on Oct 30, 2009
Latest reply on Nov 2, 2009 by MicahVillmow

Running the attached kernel with

local_size = 64

global_size = 64*20 to use all 20 wavefronts on 4870

where inside the kernel I write to lines of an output buffer 64x20 wide in a loop.

The loop repeats "iter" times.

Up to iter==1127, everything looks fine on the output. For iter>1128, it breaks.  Results are complete garbage, perhaps due to memory corruption.  On my gtx260 using the nvidia stack, this number can be a factor of 80 higher with no problems ...  and I'm no where near running out of memory.  Is this a user error, bug, or hardware limitation?



__kernel void GenerateOut(uint iter, __global uint4 *seed, __local uint4 *shmem, __global uint4 *rngs) { uint4 rngRegs[REG_COUNT]; LoadState(seed, rngRegs, shmem); for(uint i=0;i<iter;i++){ rngs[get_global_id(0) + i*get_global_size(0)] = Generate(rngRegs, shmem); } SaveState(shmem, seed); }