cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rick_weber
Adept II

Pointers don't persist across kernel calls

I know passing cl_mem objects to kernels is the prescribed way of doing things, but it has the annoying side effect of not being able to offset the base pointer when passed to a kernel, like you can in CUDA. E.g. you have to set the cl_mem handle in clSetKernelArg() instead of &addr[27]. This is pretty annoying in some applications because you then have to add additional offset parameters to kernels if you want to do things like operate on submatrices (though, in OpenCL 1.1 you can create subbuffers as long as you only recurse once...).

One solution we came up with was to each time you allocate a cl_mem, you call a kernel that gets the actual address on the GPU and shoves it into an unsigned integer. E.g:

typedef unsigned int ptr;

__kernel stripPtr(__global void* data, __global ptr* myPtr)

{

*myPtr = (ptr)data;
}

I checked the pointer size for the GPU and saw it was 4 bytes, so this should work. I then fetch the value of myPtr from its cl_mem object. Subsequent kernels that used the pointer did the following:

 

__kernel writePtr(ptr data)

{

__global float* realPtr = (__global float*)data;

realPtr[get_global_id(0)] = get_global_id(0);
}

However, we found that this method didn't work. E.g, when we copied the cl_mem object back from the GPU, we didn't see {0, 1, 2, ...}.

Are there virtual memory tricks and mapping going on that stop this from working? Also, would this imply that no pointer persists on the GPU beyond a kernel call? This would mean that if I had a list or something that the links between nodes would be invalidated after the kernel ended.

0 Likes
7 Replies
cjang
Journeyman III

Another possibility is that all of the pointer aliasing has confused the compiler? Even if the pointer munging is possible, the generated output from the compiler may be otherwise. It could also be that the boundary between compiler and driver may not be so clear. I imagine there may be a lot of code augmentation required to support the language semantics.

0 Likes

rick.weber,
This does not work because there is no pointer being bound to the writePtr kernel. Since no pointer is being bound, the address is invalid and the write is discarded. We have internal tests that allow the pointer to work across kernel calls, so it does work.
0 Likes

Originally posted by: MicahVillmow rick.weber, This does not work because there is no pointer being bound to the writePtr kernel. Since no pointer is being bound, the address is invalid and the write is discarded. We have internal tests that allow the pointer to work across kernel calls, so it does work.


Could you give us some hints of how writePtr could be modified in order to work as expected?

0 Likes

pass in data as a pointer and not a scalar. That will make sure that the memory is bound to the kernel correctly and it should then work.
0 Likes

I see this as very significant good news. It implies that memory can be managed by applications directly or a VM built over the OpenCL runtime. We can still use OpenCL as the JIT and to schedule kernels but also have more direct control over memory - i.e. eagerly allocate up-front on startup and then have our own heap manager.

Seriously, this is cool. Thank you Rick Weber for asking the question and thank you Micah Villmow!

0 Likes

but on some other platform this will maybe not work. for example if you run some kernel with one buffer and then another implementation may swap buffer out from GPU memory so your address will become invalid.

0 Likes
cjang
Journeyman III

Of course... it's never guaranteed to work. However, I like having the option to test and detect if this optimization is available. Programming idioms always happen.

My experience is that kernel designs and tuning must be matched to devices and platforms. It's not a big step to view the OpenCL runtime platform itself in the same way.

0 Likes