LeeHowes,
Thank You for your reply.
My kernel consists of 3 parts: reading data, count, writing to global memory. The last step takes half of the total time. The problem is very comfortable for OpenCL, so i can write to global memory any way i want. how it should be done correctly (with local memory or write from private and so on)?