Archives Discussions

Hill_Groove · ‎05-10-2010

any hints

Hello,

I am interested in writing big arrays to global memory from kernels. Is there any way I can accellerate this process (i.e. using local memory)?

LeeHowes · ‎05-10-2010

That's a rather vague question... make sure you try to do 128-bit writes that align from one work item to the next. That should help you achieve peak bandwidth.

More than that would depend on what you're doing.

Hill_Groove · ‎05-11-2010

LeeHowes,

Thank You for your reply.

My kernel consists of 3 parts: reading data, count, writing to global memory. The last step takes half of the total time. The problem is very comfortable for OpenCL, so i can write to global memory any way i want. how it should be done correctly (with local memory or write from private and so on)?

LeeHowes · ‎05-11-2010

Oh, well try to arrange it as 128-bit writes from registers, then. Preferably a vector register:

float4 stuffinhere = somethingorother;

((float4*)outputPointer)[offsetFromOutputPointer] = stuffinhere;

Hill_Groove · ‎05-13-2010

Thanks a lot, LeeHowes

Archives Discussions

Fast Writes To Global Memory