Archives Discussions

antzrhere · ‎08-25-2011

Just a quick question is pointers betwen global memory buffers permitted?

i.e.

__kernel void TraceRays(__global const float* buf1, __global const float* buf2)
{

int value;

ptrdiff_t OffsetToBuffer2 = buf2-buf1;

// access buf2 data - illegal?
value = buf1[ OffsetToBuffer2 ];

}

The code works, however I read in an nVidia OpenCL guide that pointers between two buffers is not permitted according to the OpenCL spec. The thing is, this code atleast works on my ATI 5870 and is massively quicker than reading from an array of buffers (i.e. buf[bufferOffset + dataoffset] is faster than buf[bufferselectflag][dataoffset]). I imagine this would probably not present any issues on an x86 platform, but GPU memory space is probably another thing. Would this code present some incompatiblity problems on other hardware?

MicahVillmow · ‎08-25-2011

"nVidia OpenCL guide that pointers between two buffers is not permitted according to the OpenCL spec" <-- this is incorrect as nothing in the OpenCL spec has this restriction. The restrictions on pointers is specified in 6.8.a of the OpenCL spec.

That being said, doing pointer diff's will cause a performance regression compared to not doing so as we must make sure that buf1 and buf2 exist in the same hardware address space instead of optimizing them to their own address space so that the code works.

antzrhere · ‎08-25-2011

Thanks. That was a quick reply!

That helps. The reason I'm using two buffers is that I'm exceeding the max allocation size and require access to one big buffer. I can't split the problem over two seperate address spaces logically (infact I am doing that in a way using this approach, but using some arithmatic to make it linearly map like one buffer), and Images won't do the job.

Thanks

antzrhere · ‎08-25-2011

Just to confirm I read it here: http://developer.download.nvidia.com/OpenCL/NVIDIA_OpenCL_JumpStart_Guide.pdf

It says "...To do this on OpenCL, pointers must be converted to be relative to the buffer base pointer and only refer to data within the buffer itself (no pointers between OpenCL buffers are allowed). "

Just to confirm, this can be done even between buffers allocated with seperate clCreateBuffer() calls?

MicahVillmow · ‎08-25-2011

Yes, the only restriction in OpenCL is pointers between address space, not pointers within the same address space. Looks like the NVidia guide is wrong here. The OpenCL spec does not put restrictions on this case, so the rules in OpenCL are the same as they are in C99 specification.

laobrasuca · ‎08-25-2011

Originally posted by: antzrhere ... reading from an array of buffers (i.e. buf[bufferOffset + dataoffset] is faster than buf[bufferselectflag][dataoffset]) ...

sorry, but, is there such thing as arrays of buffer?

MicahVillmow · ‎08-25-2011

Nope, from section 6.8.a
"Arguments to kernel functions in a program cannot be declared as a pointer to a
pointer(s)."

antzrhere · ‎08-25-2011

No, not as arguments passed to kernels, but you can construct an array of pointers that address different buffers arguments within the kernel code itself.

My use of language in the previous post was a bit ambigous.

Essentially This what I'm doing, but using a pointer offset and a single base instead of an array of pointers (faster) - it works well enough and I'm glad its legal. The extra code I needed to calculate the pointer offset to the required buffer incurred a performance decrease of ~10%, but being able to address all device memory from one base address is worth the hit.

antzrhere · ‎08-25-2011

The code above IS useless, but it has nothing to do with my project, it was just to explain my question.

The usefulness of it is I can address all my GPU memory like it is a single buffer - the data is an octree and I traverse up and down it in no predefined order. I couldn't do this easily with two seperate buffers as I need a single linear addressable space that matches the file from where it is streamed from. Of course I could change the system, but this seems the easiest way.

And no I don't pre-calculate the pointer offsets, this would be certainly catastrophic - I calculate them on the fly at the beginning of my kernel code (for every work item)

rick_weber · ‎08-25-2011

Ahh I see it now. Since you recalculate every time, and C allows negative indices, it should work.

laobrasuca · ‎08-29-2011

Originally posted by: MicahVillmow Nope, from section 6.8.a "Arguments to kernel functions in a program cannot be declared as a pointer to a pointer(s)."

that's what I was thinking about.

Originally posted by: antzrhere No, not as arguments passed to kernels, but you can construct an array of pointers that address different buffers arguments within the kernel code itself.

if I understand it correctly, in this case you have to pass at least as much arguments as buffers you have, isn't it? Meaning, if you have thousand of buffers you can hit the max argument counter.

antzrhere · ‎08-30-2011

Yes, you're correct. Added to this is the fact that the kernel would spend alot of time just calculating the offset before anything else happens, it would be terribly inefficient. However, all I needed was 4 buffers (3 buffer offsets) as this is enough to address all GPU memory (the maximum single buffer that can be allocated is 1/4th of total memory). 2Dimages aren't subject to a max memory size restriction (as far as i'm aware), but they're not quite as flexible for reading from as general buffers.

I find it sad that OpenCL has a limit on the max memory that can be allocated in one go. I realise it might suite some GPU memory systems, but there are GPU vendors out there that actually ignore this limit in their OpenCL implementation (wrongly of course) and certainly most CPU implementations wouldn't have any issue with unrestricted allocation size. Maybe future OpenCL spec will remove this restriction when things take off abit?

laobrasuca · ‎08-30-2011

I hope sometime soon we will be able to pass an array of buffers more efficiently.

Regarding the max memory allocation, the problem is not the spec, but the vendors. The spec determines a minimum value for the device max alloc size, the problem is that most of times the vendors stick with this minimum value. Hope things will evolute such that we will be able to allocate as much memory as device global mem size available, and hope also that device global mem size will match the physical memory available (which is not always the case).

rick_weber · ‎08-25-2011

While your code is completely legal, I don't expect it to do anything useful.You're computing the pointer difference of two arbitrarily allocated buffers and using that as an index.

Also, If you call this kernel multiple times even with the same arguments, the actual pointers buf1 and buf2 can change if the driver decides to map the memory addresses to the buffers differently. So you can't make any assumptions about what the value of the pointers actually are.

Archives Discussions

pointers between buffers - illegal or OK?