I am interested in writing big arrays to global memory from kernels. Is there any way I can accellerate this process (i.e. using local memory)?
That's a rather vague question... make sure you try to do 128-bit writes that align from one work item to the next. That should help you achieve peak bandwidth.
More than that would depend on what you're doing.
Thank You for your reply.
My kernel consists of 3 parts: reading data, count, writing to global memory. The last step takes half of the total time. The problem is very comfortable for OpenCL, so i can write to global memory any way i want. how it should be done correctly (with local memory or write from private and so on)?
Oh, well try to arrange it as 128-bit writes from registers, then. Preferably a vector register:
float4 stuffinhere = somethingorother;
((float4*)outputPointer)[offsetFromOutputPointer] = stuffinhere;
Thanks a lot, LeeHowes
Retrieving data ...