Hello.
In CUDA docs I've come across the "warp" term. As I've understood an appropriate warps usage also provides computing acceleration. Is this technique should be taken into account both for ATI and NVidia videocards? [I couldn't find this term in ATI specifications]
Wavefront is the equivalent of warp.
and for best performance group size should be set to multiple of wavefront size. on high end card it is 64 for middle 32 and low end it is only 16.
Thank You for useful replies.
What parameter that CLInfo dumps tells you the wavefront size?
kbrafford
as I've understood, it depends on videocard chipset. As nou said, "on high end card it is 64 for middle 32 and low end it is only 16". For RV670 it's 64, gtx285 - 32.
And another question. If a wavefront size is 64, and a workgorup size is 256, than workgroup will be processed in four wavefronts. Wavefront 64 threads is being processed by stream processor from beginning to end. The second wavefront is processed by another and so on. Where physically __local mem is stored? Is local memory a software feature?
Originally posted by: Hill_Groove
And another question. If a wavefront size is 64, and a workgorup size is 256, than workgroup will be processed in four wavefronts.
yes
Wavefront 64 threads is being processed by stream processor from beginning to end. The second wavefront is processed by another and so on.
No. All 4 wavefronts are being processed on the same simd core. Whole workgroup is assigned to one simd core. First workgroup goes to first simd core, second to second and so on.
Where physically __local mem is stored? Is local memory a software feature?
It's a tricky question . On 4xxx __local mem is really __global mem ( ATI thinks it's too much work to optimize compiler to use 48xx LDS - although it's possible ). On 5xxx __local is LDS - so it's located in simd core.
hazeman
thank you for explanation.