5 Replies Latest reply on Sep 22, 2009 3:22 PM by luxert

    A question of StreamWrite latency

    hphung

      Hello,

      I find a strange phenomenon when I develop ATI Stream application.

      As we my expect, the latency of StreamWrite depends on the level of VGA cards.

      However, the strange thing is that, in my experiment, the latency of StreamWrite is "longer" for higher level VGA.

      For example, using 1.4 SDK, the latency of writing a grey level HD (1920x1080) image in Radeon 4890 is 5.8ms, however in Radeon 3450, it only takes 4.8ms. (The resuts are the long term average)

      Can anyone explain the reason behind this observation? Is that a driver issue or just due to different architectures of different VGAs.

       

        • A question of StreamWrite latency
          Raistmer
          I see higher latencies for HD4870 versus HD2600 (different hosts though).
            • A question of StreamWrite latency
              gaurav.garg

              Usually data tranfer performances over PCIe are very much dependent on host configuration, specially the chipset. What results do you see with PCIeSpeedTest?

                • A question of StreamWrite latency
                  luxert

                  I have a same problem,, Stream Write,,

                  3850 Memory is DDR3 256bit 1800MHz, and

                  4890 is DDR5 256bit 3900MHz..

                  but!!

                  3850 is faster than 4890 GPU..

                  What is it??

                    • A question of StreamWrite latency
                      gaurav.garg

                      StreamRead/Write transfer data across PCI-e, hence GPU's internal memory interface has nothing to do with this performance.

                        • A question of StreamWrite latency
                          luxert

                           

                          My system is

                          CPU : Intel Core2 Quad Q6600(2.4GHz)

                          RAM : DDR2 2Gb

                          OS : Windows XP Pro SP3

                          SDK : Stream 1.4.0

                           

                          I try PCIeSpeedTest..

                          Radeon 3850 256bit DDR3 512Mb is

                           

                          ===> Testing device 0 <===
                          Device type: RV670
                          Max resource 2D width/height: 8192/8192
                          Total GPU memory size: 512 MB
                          Total CPU cached space size: 64 MB
                          Total CPU uncached space size: 512 MB
                          GPU engine clock: 669 MHz
                          GPU memory clock: 700 MHz
                          Number of timing loops: 100
                          [        16 bytes] CPU->GPU= 573.868 KB/sec, GPU->CPU= 595.604 KB/sec
                          [        32 bytes] CPU->GPU=   1.172 MB/sec, GPU->CPU=   1.180 MB/sec
                          [        64 bytes] CPU->GPU=   2.350 MB/sec, GPU->CPU=   2.768 MB/sec
                          [       128 bytes] CPU->GPU=   5.714 MB/sec, GPU->CPU=   5.680 MB/sec
                          [       256 bytes] CPU->GPU=  10.999 MB/sec, GPU->CPU=  10.797 MB/sec
                          [       512 bytes] CPU->GPU=  22.506 MB/sec, GPU->CPU=  22.664 MB/sec
                          [      1024 bytes] CPU->GPU=  45.854 MB/sec, GPU->CPU=  39.639 MB/sec
                          [      2048 bytes] CPU->GPU=  84.010 MB/sec, GPU->CPU=  88.393 MB/sec
                          [      4096 bytes] CPU->GPU= 179.108 MB/sec, GPU->CPU= 177.485 MB/sec
                          [      8192 bytes] CPU->GPU= 346.526 MB/sec, GPU->CPU= 357.558 MB/sec
                          [     16384 bytes] CPU->GPU= 656.935 MB/sec, GPU->CPU= 668.644 MB/sec
                          [     32768 bytes] CPU->GPU=   1.411 GB/sec, GPU->CPU=   1.405 GB/sec
                          [     65536 bytes] CPU->GPU=   2.302 GB/sec, GPU->CPU=   1.825 GB/sec
                          [    131072 bytes] CPU->GPU=   2.454 GB/sec, GPU->CPU=   1.915 GB/sec
                          [    262144 bytes] CPU->GPU=   2.534 GB/sec, GPU->CPU=   1.964 GB/sec
                          [    524288 bytes] CPU->GPU=   2.578 GB/sec, GPU->CPU=   1.986 GB/sec
                          [   1048576 bytes] CPU->GPU=   2.599 GB/sec, GPU->CPU=   1.998 GB/sec
                          [   2097152 bytes] CPU->GPU=   2.614 GB/sec, GPU->CPU=   2.005 GB/sec
                          [   4194304 bytes] CPU->GPU=   2.621 GB/sec, GPU->CPU=   2.008 GB/sec
                          [   8388608 bytes] CPU->GPU=   2.624 GB/sec, GPU->CPU=   2.011 GB/sec
                          [  16777216 bytes] CPU->GPU=   2.627 GB/sec, GPU->CPU=   2.011 GB/sec
                          [  33554432 bytes] CPU->GPU=   2.624 GB/sec, GPU->CPU=   2.021 GB/sec
                          [  67108864 bytes] CPU->GPU=   2.626 GB/sec, GPU->CPU=   2.022 GB/sec
                          [ 134217728 bytes] CPU->GPU=   2.627 GB/sec, GPU->CPU=   2.022 GB/sec
                          [ 268435456 bytes] CPU->GPU=   2.628 GB/sec, GPU->CPU=   2.023 GB/sec
                          calResAllocLocal2D() returned an error when trying to allocate 536870912 bytes!
                          calResAllocRemote2D() returned an error when trying to allocate 536870912 bytes
                          (uncached)!
                          Peak CPU->GPU Bandwidth =   2.628 GB/sec [data size = 268435456 bytes]
                          Peak GPU->CPU Bandwidth =   2.023 GB/sec [data size = 268435456 bytes]

                           

                          Radeon 4890 256bit DDR5 1Gb is

                           

                          ===> Testing device 0 <===
                          Device type: RV770
                          Max resource 2D width/height: 8192/8192
                          Total GPU memory size: 1024 MB
                          Total CPU cached space size: 64 MB
                          Total CPU uncached space size: 128 MB
                          GPU engine clock: 900 MHz
                          GPU memory clock: 975 MHz
                          Number of timing loops: 100
                          [        16 bytes] CPU->GPU= 733.050 KB/sec, GPU->CPU= 560.169 KB/sec
                          [        32 bytes] CPU->GPU= 952.406 KB/sec, GPU->CPU= 804.848 KB/sec
                          [        64 bytes] CPU->GPU=   1.472 MB/sec, GPU->CPU=   1.416 MB/sec
                          [       128 bytes] CPU->GPU=   2.617 MB/sec, GPU->CPU=   2.312 MB/sec
                          [       256 bytes] CPU->GPU=  12.373 MB/sec, GPU->CPU=  11.966 MB/sec
                          [       512 bytes] CPU->GPU=  24.790 MB/sec, GPU->CPU=  31.049 MB/sec
                          [      1024 bytes] CPU->GPU=  60.860 MB/sec, GPU->CPU=  54.729 MB/sec
                          [      2048 bytes] CPU->GPU= 114.138 MB/sec, GPU->CPU= 100.031 MB/sec
                          [      4096 bytes] CPU->GPU= 245.733 MB/sec, GPU->CPU= 258.360 MB/sec
                          [      8192 bytes] CPU->GPU= 526.369 MB/sec, GPU->CPU= 536.562 MB/sec
                          [     16384 bytes] CPU->GPU= 943.138 MB/sec, GPU->CPU= 734.267 MB/sec
                          [     32768 bytes] CPU->GPU=   1.667 GB/sec, GPU->CPU= 776.676 MB/sec
                          [     65536 bytes] CPU->GPU=   2.219 GB/sec, GPU->CPU= 791.132 MB/sec
                          [    131072 bytes] CPU->GPU=   2.475 GB/sec, GPU->CPU= 801.243 MB/sec
                          [    262144 bytes] CPU->GPU=   2.547 GB/sec, GPU->CPU= 805.337 MB/sec
                          [    524288 bytes] CPU->GPU=   2.576 GB/sec, GPU->CPU= 806.916 MB/sec
                          [   1048576 bytes] CPU->GPU=   2.608 GB/sec, GPU->CPU= 808.295 MB/sec
                          [   2097152 bytes] CPU->GPU=   2.619 GB/sec, GPU->CPU= 808.892 MB/sec
                          [   4194304 bytes] CPU->GPU=   2.623 GB/sec, GPU->CPU= 809.165 MB/sec
                          [   8388608 bytes] CPU->GPU=   2.626 GB/sec, GPU->CPU= 809.305 MB/sec
                          [  16777216 bytes] CPU->GPU=   2.626 GB/sec, GPU->CPU= 809.318 MB/sec
                          [  33554432 bytes] CPU->GPU=   2.626 GB/sec, GPU->CPU= 809.408 MB/sec
                          [  67108864 bytes] CPU->GPU=   2.628 GB/sec, GPU->CPU= 809.432 MB/sec
                          [ 134217728 bytes] CPU->GPU=   2.629 GB/sec, GPU->CPU= 809.446 MB/sec
                          [ 134217728 bytes] CPU->GPU=   1.314 GB/sec, GPU->CPU= 404.726 MB/sec
                          Peak CPU->GPU Bandwidth =   2.629 GB/sec [data size = 134217728 bytes]
                          Peak GPU->CPU Bandwidth = 809.446 MB/sec [data size = 134217728 bytes]

                           

                          That result is strange..

                          4890's GPU->CPU speed is slow than 3850..

                          Please,,  Help me T.T..