Archives Discussions

thanasio · ‎07-08-2014

Hi,

i am coming to opencl with a cuda background and i was hoping to find a fast thread communication function similar to cuda's __shuffle(). I bumped into

work_group_reduce() function available in opencl and i was wondering what is its performance when used to reduce values held in registers.

Cheers,

T

sudarshan · ‎07-10-2014

Hi,

work_group_reduce is in OpenCL 2.0 and AMD stack for OpenCL 2.0 is not yet out. OpenCL 2.0 will have support for shuffle also.

Do you mean performance compared to a hand coded kernel?

thanasio · ‎07-10-2014

With performance, i mean does this use any memory buffers for the reduction (ie. local or global), or just private memory?

In regards to shuffle, do you mean that opencl 2, will have a similar instruction to CUDA's __shuffle() (exchange of private memory values in a wavefront), or you mean that it will support the openCL's shuffle, which looks like a convenience function for sorting numbers in a vector?

Cheers,

T

dipak · ‎11-07-2014

work_group_reduce function takes built-in data-types (e.g half, int, uint, long, ulong float or double) as input. It receives the input argument as value, not as a pointer. So, you can pass any value pointed by global or local memory pointer as follows.

kernel void (global const int *inData, global int *outData)

{

uint gid = get_global_id(0);

...

outData[gid ] = work_group_reduce_min(inData[gid]);

}

In the above example, the "inData" or "outData" may be a local memory pointer.

Archives Discussions

intra thread communication with opencl