Hi!
I want to use subbuffers to split work between devices.
The concept seems to fit my needs, but somehow, its not working as it is expected.
I implemented an app to test subbuffer usage, but it didn't work as i wanted.
I have a book called OpenCL Progamming Guide which was writted by OpenCL authorities, and i found a sample implementation in it in demonstrating subbuffers, and the results are the same.
Could somebody check this sample source code, and tell me, what is wrong with it?
I would expect, and the authors too, that at the end, when the buffer was readed back, the calculation was split between the devices, and acts like one device did the calculation.
However its not what happens. The subbuffer results wont apear in the parent buffer.
The souce code can be downloaded from bgaster/opencl-book-samples · GitHub chapter 7. The NUM_BUFFER_ELEMENTS needs to be changed to 128 because of the memory alignment.
Thanks!
Solved! Go to Solution.
OK. I think I see your problem. When creating buffer and passing these as args to each kernel, you shouldn't use offset ids. Each kernel sees a new buffer counting from 0.
Haven't used subbuffer myself, always using the whole buffer and working with offsets. Faster and fewer resources. Therefore i don't know if original buffers[0] will be updated, or you have to read each subbufer separately for the results. I suspect that the latter is true, since spec says that "it creates a new buffer" for subbuffer, and a copy back would need a synchronization flag...
PS: You should really use clReleaseEvent or you will get memory leaks...
Hmmm. I have used many times subbuffers, and with the exception of some wrong offsets, everything worked OK.
I think I see your problem. You haven't specified what type the buffer is (char, float, ?). As a result ocl compiler doesn't know what is buffer[id], and guesses wrongly.
Try passing to the kernel (__global char *buffer) or similar.
Thank you for you answer!
I corrected the kernel call, with __kernel void square(__global int* buffer), but nothing changed.
zoli0726 wrote:
Thank you for you answer!
I corrected the kernel call, with __kernel void square(__global int* buffer), but nothing changed.
Yeah. I guess the default guess by the compiler is int...
What do you get vs expect?
Subbufers are at the core of all ocl calculations and they work correctly...
Ok. As I read it, subbuffers purpose is to allow different parts of a buffer to be independently updated.
I have a buffer, and I create two subbuffers from it splitting it to two parts. I update the first half with the first device in the context, and the second one with the second device.
Then when I read the parent buffer back, I see both of the modifications. When I use two GPU-s thats not happening. When I use GPU+CPU the cpu's half is incomplete.
(there are some valid results, but not all of them).
OK. I think I see your problem. When creating buffer and passing these as args to each kernel, you shouldn't use offset ids. Each kernel sees a new buffer counting from 0.
Haven't used subbuffer myself, always using the whole buffer and working with offsets. Faster and fewer resources. Therefore i don't know if original buffers[0] will be updated, or you have to read each subbufer separately for the results. I suspect that the latter is true, since spec says that "it creates a new buffer" for subbuffer, and a copy back would need a synchronization flag...
PS: You should really use clReleaseEvent or you will get memory leaks...