7 Replies Latest reply on Dec 8, 2009 10:38 AM by Hill_Groove

    NVidia and ATI

    Hill_Groove
      Warps

      Hello.

      In CUDA docs I've come across the "warp" term. As I've understood an appropriate warps usage also provides computing acceleration. Is this technique should be taken into account both for ATI and NVidia videocards? [I couldn't find this term in ATI specifications]

        • NVidia and ATI
          omkaranathan

          Wavefront is the equivalent of warp.

          • NVidia and ATI
            nou

            and for best performance group size should be set to multiple of wavefront size. on high end card it is 64 for middle 32 and low end it is only 16.

              • NVidia and ATI
                Hill_Groove

                Thank You for useful replies.

                • NVidia and ATI
                  kbrafford

                  What parameter that CLInfo dumps tells you the wavefront size?

                    • NVidia and ATI
                      Hill_Groove

                      kbrafford

                      as I've understood, it depends on videocard chipset. As nou said, "on high end card it is 64 for middle 32 and low end it is only 16". For RV670 it's 64, gtx285 - 32.

                       

                      And another question. If a wavefront size is 64, and a workgorup size is 256, than workgroup will be processed in four wavefronts. Wavefront 64 threads is being processed by stream processor from beginning to end. The second wavefront is processed by another and so on. Where physically __local mem is stored? Is local memory a software feature?

                        • NVidia and ATI
                          hazeman

                           

                          Originally posted by: Hill_Groove

                          And another question. If a wavefront size is 64, and a workgorup size is 256, than workgroup will be processed in four wavefronts.



                          yes

                           

                          Wavefront 64 threads is being processed by stream processor from beginning to end. The second wavefront is processed by another and so on.


                          No. All 4 wavefronts are being processed on the same simd core. Whole workgroup is assigned to one simd core. First workgroup goes to first simd core, second to second and so on.

                           

                          Where physically __local mem is stored? Is local memory a software feature?


                          It's a tricky question . On 4xxx __local mem is really __global mem ( ATI thinks it's too much work to optimize compiler to use 48xx LDS - although it's possible ). On 5xxx __local is LDS - so it's located in simd core.