Just a quick question is pointers betwen global memory buffers permitted?
i.e.
__kernel void TraceRays(__global const float* buf1, __global const float* buf2)
{
int value;
ptrdiff_t OffsetToBuffer2 = buf2-buf1;
// access buf2 data - illegal?
value = buf1[ OffsetToBuffer2 ];
}
The code works, however I read in an nVidia OpenCL guide that pointers between two buffers is not permitted according to the OpenCL spec. The thing is, this code atleast works on my ATI 5870 and is massively quicker than reading from an array of buffers (i.e. buf[bufferOffset + dataoffset] is faster than buf[bufferselectflag][dataoffset]). I imagine this would probably not present any issues on an x86 platform, but GPU memory space is probably another thing. Would this code present some incompatiblity problems on other hardware?
Thanks. That was a quick reply!
That helps. The reason I'm using two buffers is that I'm exceeding the max allocation size and require access to one big buffer. I can't split the problem over two seperate address spaces logically (infact I am doing that in a way using this approach, but using some arithmatic to make it linearly map like one buffer), and Images won't do the job.
Thanks
Just to confirm I read it here: http://developer.download.nvidia.com/OpenCL/NVIDIA_OpenCL_JumpStart_Guide.pdf
It says "...To do this on OpenCL, pointers must be converted to be relative to the buffer base pointer and only refer to data within the buffer itself (no pointers between OpenCL buffers are allowed). "
Just to confirm, this can be done even between buffers allocated with seperate clCreateBuffer() calls?
Originally posted by: antzrhere ... reading from an array of buffers (i.e. buf[bufferOffset + dataoffset] is faster than buf[bufferselectflag][dataoffset]) ...
sorry, but, is there such thing as arrays of buffer?
No, not as arguments passed to kernels, but you can construct an array of pointers that address different buffers arguments within the kernel code itself.
My use of language in the previous post was a bit ambigous.
Essentially This what I'm doing, but using a pointer offset and a single base instead of an array of pointers (faster) - it works well enough and I'm glad its legal. The extra code I needed to calculate the pointer offset to the required buffer incurred a performance decrease of ~10%, but being able to address all device memory from one base address is worth the hit.
The code above IS useless, but it has nothing to do with my project, it was just to explain my question.
The usefulness of it is I can address all my GPU memory like it is a single buffer - the data is an octree and I traverse up and down it in no predefined order. I couldn't do this easily with two seperate buffers as I need a single linear addressable space that matches the file from where it is streamed from. Of course I could change the system, but this seems the easiest way.
And no I don't pre-calculate the pointer offsets, this would be certainly catastrophic - I calculate them on the fly at the beginning of my kernel code (for every work item)
Ahh I see it now. Since you recalculate every time, and C allows negative indices, it should work.
Originally posted by: MicahVillmow Nope, from section 6.8.a "Arguments to kernel functions in a program cannot be declared as a pointer to a pointer(s)."
that's what I was thinking about.
Originally posted by: antzrhere No, not as arguments passed to kernels, but you can construct an array of pointers that address different buffers arguments within the kernel code itself.
if I understand it correctly, in this case you have to pass at least as much arguments as buffers you have, isn't it? Meaning, if you have thousand of buffers you can hit the max argument counter.
Yes, you're correct. Added to this is the fact that the kernel would spend alot of time just calculating the offset before anything else happens, it would be terribly inefficient. However, all I needed was 4 buffers (3 buffer offsets) as this is enough to address all GPU memory (the maximum single buffer that can be allocated is 1/4th of total memory). 2Dimages aren't subject to a max memory size restriction (as far as i'm aware), but they're not quite as flexible for reading from as general buffers.
I find it sad that OpenCL has a limit on the max memory that can be allocated in one go. I realise it might suite some GPU memory systems, but there are GPU vendors out there that actually ignore this limit in their OpenCL implementation (wrongly of course) and certainly most CPU implementations wouldn't have any issue with unrestricted allocation size. Maybe future OpenCL spec will remove this restriction when things take off abit?
I hope sometime soon we will be able to pass an array of buffers more efficiently.
Regarding the max memory allocation, the problem is not the spec, but the vendors. The spec determines a minimum value for the device max alloc size, the problem is that most of times the vendors stick with this minimum value. Hope things will evolute such that we will be able to allocate as much memory as device global mem size available, and hope also that device global mem size will match the physical memory available (which is not always the case).
While your code is completely legal, I don't expect it to do anything useful.You're computing the pointer difference of two arbitrarily allocated buffers and using that as an index.
Also, If you call this kernel multiple times even with the same arguments, the actual pointers buf1 and buf2 can change if the driver decides to map the memory addresses to the buffers differently. So you can't make any assumptions about what the value of the pointers actually are.