I have 2 questions regarding ati implementation of opencl and its impact on raytracing.
I wasgoing through the opencl programming guide and spotted one issue where ati hardware is significantly different from Nvidia. Each stram core is actually a 4x simd unit, and there are overalll less number of cores....
This would mean that ati hardware is more simd-like than nvidia, where stream cores are non simd, and there are larger number of cores......
This would mean more penalty for divergent warps and non- vector instructions on Ati...... This is obvious as warp(wavefront) size is 64 on ati and 32 on nvidia....
Raytracing, where calculations are highly divergent, this would mean significant slowdown for ati.
Is there some way to optimize this for ati hardware?
Nvidia has a limitation that each work group is scheduled on the same streaming core until all of its work is done. This calls for some sort of producer/consumer queue implemented using atomic operations to ensure that all warps(wavefronts) within a work group are active for the entire life of the work group. This is detailed in the paper:
"understanding the efficiency of ray traversal on Gpus"
Does this apply to ati cards?
Looking forward to your responses,