Memset kernel does not work as expected

Discussion created by timchist on Jun 24, 2010
Latest reply on Jun 24, 2010 by timchist
Simple memset kernel on 4850 does not work for unsigned char type

I have several kernels, all of which fill a memory region with 0. Even though memset0_128 is the most effective one, memory block size can be different, so sometimes it may be required to use memset0_8.

However, when I check the memory after calling memset0_8, I find out, that some bytes are not equal to zero. For example, when memset0_8 is called on a 36864-byte (192x192) region with (256, 1, 1) local size, output byte #7445 always has non-zero value (the actual value is different from run to run however). Moreover, if I run the kernel on larger memory blocks, byte #7445 is also the first non-zero output byte.

memset0_16 is more stable, however, it also fails sometimes. For example, on a 307200 (512x600)-byte array it returns incorrect value in word #1522.

It seems that 32-bit write transactions are used internally when smaller blocks are written, which makes the operation not thread-safe.

I use iMac 27' with ATI 4850 Mobility card.

__kernel void memset0_128(__global uint4* dst) { dst[get_global_id(0)] = to_uint4(0); } __kernel void memset0_32(__global unsigned int* dst) { dst[get_global_id(0)] = 0; } __kernel void memset0_16(__global unsigned short* dst) { dst[get_global_id(0)] = 0; } __kernel void memset0_8(__global unsigned char* dst) { dst[get_global_id(0)] = 0; }