There are some documents available from both AMD and intel on opncl cpu optimizations easily available.
IMHO, in CPU case, thread scheduling is quite costly so it would be better to have less number of threads, each doing more work. Also you have good amount of caches on CPU also, which you can take advantage of.
I guess some highly fetchbound or synchronization requiring kernels can better be accelerated on CPU.
Originally posted by: akhal Hello
I am trying different optimization for OpenCL code tuning them for both CPUs and GPUs. Please tell me what are possible optimization ways for GPUs (such as local memory) and what are good optimization techniques spcific for CPUs (such as vectorizations etc) ?? Any techniques specific for particular architecture????
For GPU optimizations, please refer to chapter 4 of programming guide.
Things to consider in CPU optimizations:
1. Try to avoid barriers
2. Try to use vector types (float4 , for example).
3. Number of work groups should be multiple of number of CPU cores to get the maximum utilization.
4. Work group creation is an overhead, try to avoid workgroups with small number of workitems.
5. Images are emulated on the CPU - you may not get the expected performance.
6. Create memory objects with USE_HOST_PTR.