AnsweredAssumed Answered

OpenCL clSetKernelArg performance issue

Question asked by ninazero on Nov 5, 2015
Latest reply on Jan 21, 2016 by cgrant78@netzero.com

Hello,

 

I'm working on a real time ray tracer with OpenCL.

I have a structure that describes the camera, with position, orientation, field of view.
Since the camera moves around I send it every frame to the GPU.

 

And I'm doing it like this:

 

Computedcamera    cm;

// some code here

clSetKernelArg(_kernel, 0, sizeof(Computedcamera), &cm);

 

and my kernel looks like this:

 

kernel void raytracer(Computedcamera const *camera, /*others arguments) { /* */ }

 

This method gives best performance, 48FPS on my test scene, but it doesn't works on devices from other brand, like Intel or Nvidia.

 

If I change my kernel declaration to this (remove the *, to pass the argument by value):
kernel void raytracer(Computedcamera const camera, /*others arguments) { /* */ }

It now works on my CPU (Intel), but the performance on my GPU drops to 27FPS.

 

So I tried to pass this argument by buffer:

initialisation:

_camera_mem = clCreateBuffer(_context, CL_MEM_READ_ONLY | CL_MEM_HOST_WRITE_ONLY, sizeof(Computedcamera), 0, &error);

each frame:

clEnqueueWriteBuffer(_queue, _camera_mem, CL_TRUE, 0, sizeof(Computedcamera), (void *)&cm, 0, 0, 0);

clSetKernelArg(_kernel, 0, sizeof(cl_mem), &_camera_mem);


This last method works on all devices (AMD GPU, Intel CPU) but the performance on my GPU is around 43FPS (5 FPS less than the first method).

 

I don't understand why the first method is faster than the others and why it works?!

 

My config:

Win10 64bits, AMD APP SDK 3.0, i7 920, R9 Nano.

Outcomes