Originally posted by: akhal Hello
I am trying different optimization for OpenCL code tuning them for both CPUs and GPUs. Please tell me what are possible optimization ways for GPUs (such as local memory) and what are good optimization techniques spcific for CPUs (such as vectorizations etc) ?? Any techniques specific for particular architecture???? |
For GPU optimizations, please refer to chapter 4 of programming guide.
Things to consider in CPU optimizations:
1. Try to avoid barriers
2. Try to use vector types (float4 , for example).
3. Number of work groups should be multiple of number of CPU cores to get the maximum utilization.
4. Work group creation is an overhead, try to avoid workgroups with small number of workitems.
5. Images are emulated on the CPU - you may not get the expected performance.
6. Create memory objects with USE_HOST_PTR.