3 Replies Latest reply on Jul 17, 2011 4:44 PM by LeeHowes

    How does vector type increase throughput in gpu?

    krrishnarraj

      Am new to opencl and was used to cuda and nvidia gpus.

      (Excuse me for using cuda terms here)

      I thought a warp(32 threads) goes to 8 SPs( 4 threads to each SP ) in an SM

      I was going through online examples given by AMD: http://developer.amd.com/documentation/articles/Pages/OpenCL-Optimization-Case-Study_7.aspx

      it says using vectors in openCL increases throughput in GPU. now is it like 1 thread goes to 1 sp instead of 4 threads?

      Can someone explain how does it improve performance in the hardware level.

      Thanks

        • How does vector type increase throughput in gpu?
          nou

          AMD GPU use a VLIW4/5 architecture where one work unit can execute up to 4/5 instructions at once. so when you have two float4 vectors it will add in one instruction. nVidia must exectute four instructions.

          also reading float4 vector from memory is more efficient than reading four float values.

            • How does vector type increase throughput in gpu?
              krrishnarraj

              thanks for the info. that means using vectors is a must for complete utilization.

              sadly you cant use it everywhere. thats why they are dropping VLIW4 in the next GCN architecture.

                • How does vector type increase throughput in gpu?
                  LeeHowes

                  Not a must. Loop unrolling achieves the same thing. It's a VLIW architecture not a vector one at that level (like nvidia it's a vector architecture at the larger scale, of course) so the aim is to increase ILP: vectors do that by creating four instructions at a time instead of one, loop unrolling would too. What vectors also add is the ability to define 128-bit memory reads in each lane of the SIMD unit, this helps the memory system reach peak throughput because 16 lanes issuing 128-bit reads is the unit of data the memory system is designed to stream from DRAM.