    Application does not scale when using cl::Buffer-Object



      i have to following problem:

      My application does not scale for multiple GPUs. It always is a bit slower on more GPUs than on less.

      I could figure out, that a cl::Buffer-Object is causing this. I use the Buffer as follows:

      First I create a usual array with malloc() which includes 20 elements (they are filled later):

      int* pOverlap_region = (int*) malloc(80);

      After it is filled I create the Buffer-Object:

      cl::Buffer overlap_region = cl::Buffer::Buffer(
                  this->context.getOpenCLContext(), CL_MEM_COPY_HOST_PTR, 80, pOverlap_region, &err);

      this->context.getOpenCLContext() returns the context.

      Then it is set as an argument for the kernel:

      err |= kernel.setArg(3, (cl::Buffer) overlap_region);

      If this Buffer is created and *not* set as an argument, the application scales on multi-GPU.

      Does anybody know why the behaviour is like this?

      Thanks for your replies