AnsweredAssumed Answered

OpenCL clSetKernelArg performance issue

Question asked by ninazero on Nov 5, 2015
Latest reply on Jan 21, 2016 by



I'm working on a real time ray tracer with OpenCL.

I have a structure that describes the camera, with position, orientation, field of view.
Since the camera moves around I send it every frame to the GPU.


And I'm doing it like this:


Computedcamera    cm;

// some code here

clSetKernelArg(_kernel, 0, sizeof(Computedcamera), &cm);


and my kernel looks like this:


kernel void raytracer(Computedcamera const *camera, /*others arguments) { /* */ }


This method gives best performance, 48FPS on my test scene, but it doesn't works on devices from other brand, like Intel or Nvidia.


If I change my kernel declaration to this (remove the *, to pass the argument by value):
kernel void raytracer(Computedcamera const camera, /*others arguments) { /* */ }

It now works on my CPU (Intel), but the performance on my GPU drops to 27FPS.


So I tried to pass this argument by buffer:


_camera_mem = clCreateBuffer(_context, CL_MEM_READ_ONLY | CL_MEM_HOST_WRITE_ONLY, sizeof(Computedcamera), 0, &error);

each frame:

clEnqueueWriteBuffer(_queue, _camera_mem, CL_TRUE, 0, sizeof(Computedcamera), (void *)&cm, 0, 0, 0);

clSetKernelArg(_kernel, 0, sizeof(cl_mem), &_camera_mem);

This last method works on all devices (AMD GPU, Intel CPU) but the performance on my GPU is around 43FPS (5 FPS less than the first method).


I don't understand why the first method is faster than the others and why it works?!


My config:

Win10 64bits, AMD APP SDK 3.0, i7 920, R9 Nano.