cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

centershocksb12
Journeyman III

Application does not scale when using cl::Buffer-Object

Hello,

i have to following problem:

My application does not scale for multiple GPUs. It always is a bit slower on more GPUs than on less.

I could figure out, that a cl::Buffer-Object is causing this. I use the Buffer as follows:

First I create a usual array with malloc() which includes 20 elements (they are filled later):

int* pOverlap_region = (int*) malloc(80);

After it is filled I create the Buffer-Object:

cl::Buffer overlap_region = cl::Buffer::Buffer(
            this->context.getOpenCLContext(), CL_MEM_COPY_HOST_PTR, 80, pOverlap_region, &err);

this->context.getOpenCLContext() returns the context.

Then it is set as an argument for the kernel:

err |= kernel.setArg(3, (cl::Buffer) overlap_region);

If this Buffer is created and *not* set as an argument, the application scales on multi-GPU.

Does anybody know why the behaviour is like this?

Thanks for your replies

 

0 Likes
7 Replies
nou
Exemplar

try add CL_MEM_READ_ONLY to flags if you just read from this buffer. becuase runtime can sychronize this buffer across multiple GPUs. so it lead to serialization of the work.

0 Likes

Hey nou,

thanks for the fast reply.

I added the CL_MEM_READ_ONLY flag, but there is no change in behavior...

0 Likes

do you use another buffer in kernel?

0 Likes

yes, there are two more buffers: They are created like this:

err = context.getCommandQueue(i).enqueueWriteBuffer(*devicePtrs,                                                                                                                     CL_FALSE,
                                                                                                                        0,
                                                                                                                        sizePerDevice
,
                                                                                                                        (void*)(((char*)hostPtr)+offset) );

i is the device-ID given by the context, hostPtr is a pointer to the data.

 

Two Buffers are added like that above.

Do you think, this has something to do with "my" Buffer?

0 Likes

quastion is if this others buffer are used per GPU or are shared across all GPUs. BTW are that buffer shared on all GPUs?

0 Likes

The others Buffers are split and n/d elements are passed to each device (n being the input elements, d being the devices).

"My" Buffer has different data for every GPU and is passed to every GPU (it is a Buffer, in which adjacent data which resides on another GPU is stored).

0 Likes

Are you trying to send the same buffer to all the GPUs or you are sending corrosponding subBuffes to each GPU?

It would be nice if you can post some testcase and your system information: CPU,GPU,SDK,DRIVER,OS.

Thanks

0 Likes