Pointers don't persist across kernel calls

Discussion created by rick.weber on Nov 16, 2010
Latest reply on Nov 17, 2010 by cjang

I know passing cl_mem objects to kernels is the prescribed way of doing things, but it has the annoying side effect of not being able to offset the base pointer when passed to a kernel, like you can in CUDA. E.g. you have to set the cl_mem handle in clSetKernelArg() instead of &addr[27]. This is pretty annoying in some applications because you then have to add additional offset parameters to kernels if you want to do things like operate on submatrices (though, in OpenCL 1.1 you can create subbuffers as long as you only recurse once...).

One solution we came up with was to each time you allocate a cl_mem, you call a kernel that gets the actual address on the GPU and shoves it into an unsigned integer. E.g:

typedef unsigned int ptr;

__kernel stripPtr(__global void* data, __global ptr* myPtr)


*myPtr = (ptr)data;

I checked the pointer size for the GPU and saw it was 4 bytes, so this should work. I then fetch the value of myPtr from its cl_mem object. Subsequent kernels that used the pointer did the following:


__kernel writePtr(ptr data)


__global float* realPtr = (__global float*)data;

realPtr[get_global_id(0)] = get_global_id(0);

However, we found that this method didn't work. E.g, when we copied the cl_mem object back from the GPU, we didn't see {0, 1, 2, ...}.

Are there virtual memory tricks and mapping going on that stop this from working? Also, would this imply that no pointer persists on the GPU beyond a kernel call? This would mean that if I had a list or something that the links between nodes would be invalidated after the kernel ended.