37 Replies Latest reply on Apr 8, 2010 10:09 AM by jhoffmann

    ATI Stream Power Toy - PCIeSpeedTest

    michael.chu

      We've posted an "ATI Stream Power Toy" page:

      http://developer.amd.com/GPU/ATISTREAMPOWERTOY/Pages/default.aspx

      The first such "toy" on this page is a PCIeSpeedTest utility that tried to measure the PCIe bandwidth on your system. We've found this useful when trying to see what the PCIe IO bandwidth is in a system and how it might affect the overall performance.

      This is a synthetic benchmark so your actualy real world performance may vary. But, its intent is to try to extract as much bandwidth out of PCIe link in each direction as possible.

      This is offered as an unsupported, but hopefully semi-useful tool for all of you. :-) Suggestions for other "toys" you'd like to see can be sent to streamdeveloper@amd.com. No promises, but we'll see what we can do.

      If you run this and want to share your peak results, we'd be interested in seeing it on this thread! Post your peak results in each direction along with your motherboard, CPU and GPU configurations.

      Michael.

        • ATI Stream Power Toy - PCIeSpeedTest
          rahulgarg

          On Ubuntu 8.10 64-bit, it gave some compilation errors.

          Adding #include <cstdlib> in PCIeSpeedTest_random.cpp fixes this.

          After fixing and running ./PCIeSpeedTest  on a phenom 9550+780v chipset + Radeon 4870 + 1066 MHz ddr2 I get:

           

          Peak CPU->GPU Bandwidth =   5.160 GB/sec [data size = 134217728 bytes]
          Peak GPU->CPU Bandwidth =   4.415 GB/sec [data size = 4194304 bytes]

            • ATI Stream Power Toy - PCIeSpeedTest
              Firestrider

              Great little tool!

              I have a problem though. It locks up my computer and crashes my display driver when the test reaches 67108864 bytes. I'm not too concerned since I'm running Windows 7 x64 beta build 7000 with the WDDM 1.1 beta display driver but I would just like to relay the information.

              My hardware is Phenom X4 9550, Radeon HD 4850, Gigabyte GA-MA790X-UD4, 4x1GB DDR2-800 RAM.

              Similar to the PCIe bus would it be necessary to make a tool for the HyperTransport bus?

                • ATI Stream Power Toy - PCIeSpeedTest
                  jross

                  On Ubuntu 8.10 64-bit, ASUS P6T Deluxe, Intel i7 920, Corsair XMS3 DDR3 1600, ATI Radeon HD 4850 (512 MB)

                  Peak CPU->GPU Bandwidth =   4.978 GB/sec [datasize = 134217728 bytes]

                  Peak GPU->CPU Bandwidth =  2.185 GB/sec [datasize = 65536 bytes]

                  The GPU->CPU Bandwidth is a little disappointing and it gets worse as the data size increases, gradually decreasing to 1.27 GB/sec at 134217728 bytes.

                  Great little tool. Thanks.

                    • ATI Stream Power Toy - PCIeSpeedTest
                      michael.chu

                      To add my own data here... :-)

                      On my desk, I've got an MSI K9A2 Platinum (790FX chipset), with an AMD Phenom 9850 quad core with 4GB of memory and a FireStream 9250 plugged in. Running with an early version of Catalyst 9.3 on Linux SLES 10 SP2, I'm seeing:

                      Peak CPU->GPU Bandwidth = 5.532 GB/sec [data size = 536870912 bytes]
                      Peak GPU->CPU Bandwidth = 5.992 GB/sec [data size = 8388608 bytes]

                      This is one of those motherboards where if you plug cards into slots 1 and 3, you get full x16 gen2 performance but if you plug cards into 2 and 4 as well, it drops to x8 gen2.

                      I don't have the time to run it right now, but I've seen it behave well with 2 and 4 cards in there as well (with 4 cards, it scales down pretty well in my past experiments).

                      Michael.

                      • ATI Stream Power Toy - PCIeSpeedTest
                        michael.chu

                        Hi jross,

                        Our FireStream product manager and I have done some experiments on various systems as well as with some OEMs (running this exact PCIeSpeedTest) and we have seen similar peaks and dropoffs on Intel systems. Not at all sure why it does that whereas with the 790s it doesn't do that.

                        May be an artifact of the test, but I can't imagine where... if I had the time, or if anyone here has the time, I wanted to try the same test using user pinned memory in CAL.

                        Michael.

                          • ATI Stream Power Toy - PCIeSpeedTest
                            zpdixon

                            It is interesting to see these numbers so close to what the theory predicts. PCI-E 2.0 x16 gives 8GB/s of max theoretical bandwidth per direction. However the most important factor determining the practical bandwith is the Max_Payload_Size setting (between 128 bytes and 4096 bytes) negotiated between endpoints (cards) and root ports (on motherboard chipsets). 128 bytes, the default value, allow using 60% of the max theoretical bw, 256 bytes 70%, ... and 4096 bytes almost 100%.

                            It is very common for PCI-E cards to support a Max_Payload_Size of 256, 512, or 1024 bytes. Unfortunately even as of 2009 the vast majority of chipsets only support 128 or 256 bytes.

                            This explains why most of you measure a practical usable bandwidth with that tool of roughly between 8 GB/s * 60% = 4.8 GB/s and 8 GB/s * 70% = 5.6 GB/s.

                            If you want to see the Max_Payload_Size value on your system, under Linux/BSD/Solaris, run "lspci -vv".

                            • ATI Stream Power Toy - PCIeSpeedTest
                              rahulgarg

                              Pinned memory: Modified the file and replaced calResAllocRemote2D call with a custom call to calResCreate2D.

                              calResCreate2D returned an error when trying to allocate 16777216 bytes !
                              Peak CPU->GPU Bandwidth =   4.877 GB/sec [data size = 2097152 bytes]
                              Peak GPU->CPU Bandwidth =   3.957 GB/sec [data size = 2097152 bytes]

                          • ATI Stream Power Toy - PCIeSpeedTest
                            michael.chu

                            Hi Firestrider, yeah, I've noticed that my system will kind of get a bit sluggish on the larger transfers. Not quite sure why (haven't had a chance to investigate it yet).

                            In this benchmark, I essentially stack about 100 calMemCopy() requests from/to uncached CAL memory resources on the CPU side to GPU memory resources. I wait for the very last CALevent to be done.

                            For the HT test, not sure if you can control things at the application level close enough to measure that performance. In my "prior life" at an in-socket FPGA accelerator company, it was much easier to do that test because we had specific calls that were sending and receiving data across the HT bus to the accelerator.

                        • ATI Stream Power Toy - PCIeSpeedTest
                          rahulgarg

                          I modified the test slightly to test for CPU Cacheable resources. Instead of passing flag 0 in ResAllocRemote2D, I passed flag CAL_RESALLOC_CACHEABLE. On my system, cacheable remote RAM is restricted to 60MB and here are the results for peak:

                          Peak CPU->GPU Bandwidth =   5.069 GB/sec [data size = 16777216 bytes]
                          Peak GPU->CPU Bandwidth =   3.745 GB/sec [data size = 1048576 bytes]

                          (System details posted in earlier reply)

                          • ATI Stream Power Toy - PCIeSpeedTest
                            FangQ

                            My home computer wasn't built for stream computing, I only use it to learn stream programming and code prototyping, so the outputs are not as impressive as others.

                            My MOBO is EVGA GeForce7050(610i), with Intel Q6700 quad core+3G DDR2 memory+Radeon HD 4650 (512M), the PCIeSpeedTest output is below:

                            calResAllocLocal2D() returned an error when trying to allocate 268435456 bytes!
                            Peak CPU->GPU Bandwidth =   2.620 GB/sec [data size = 67108864 bytes]
                            Peak GPU->CPU Bandwidth =   3.160 GB/sec [data size = 33554432 bytes]

                              • ATI Stream Power Toy - PCIeSpeedTest
                                bayoumi

                                I am more interested in smaller data size (real life communications).

                                Scientific Linux 5.2 64b, Phenom 9550, 8GB DDR2-800, 790X chipset, MSI K9A2-CF motherboard, single HD4870/ 1GB

                                PCIeSpeedTest
                                Devices found: 1

                                ===> Testing device 0 <===
                                Device type: RV770
                                Max resource 2D width/height: 8192/8192
                                Total GPU memory size: 1024 MB
                                Total CPU cached space size: 60 MB
                                Total CPU uncached space size: 1984 MB
                                GPU engine clock: 0 MHz
                                GPU memory clock: 0 MHz
                                Number of timing loops: 100
                                [        16 bytes] CPU->GPU= 533.333 KB/sec, GPU->CPU 400.000 KB/sec
                                [        32 bytes] CPU->GPU= 800.000 KB/sec, GPU->CPU   1.067 MB/sec
                                [        64 bytes] CPU->GPU=   1.067 MB/sec, GPU->CPU   2.133 MB/sec
                                [       128 bytes] CPU->GPU=   3.200 MB/sec, GPU->CPU   4.267 MB/sec
                                [       256 bytes] CPU->GPU=   8.533 MB/sec, GPU->CPU   6.400 MB/sec
                                [       512 bytes] CPU->GPU=  12.800 MB/sec, GPU->CPU  17.067 MB/sec
                                [      1024 bytes] CPU->GPU=  14.629 MB/sec, GPU->CPU  34.133 MB/sec
                                [      2048 bytes] CPU->GPU=  51.200 MB/sec, GPU->CPU  68.267 MB/sec
                                [      4096 bytes] CPU->GPU=  51.200 MB/sec, GPU->CPU 102.400 MB/sec
                                [      8192 bytes] CPU->GPU= 273.067 MB/sec, GPU->CPU 273.067 MB/sec
                                [     16384 bytes] CPU->GPU= 273.067 MB/sec, GPU->CPU 273.067 MB/sec
                                [     32768 bytes] CPU->GPU= 819.200 MB/sec, GPU->CPU   1.092 GB/sec
                                [     65536 bytes] CPU->GPU= 936.229 MB/sec, GPU->CPU   2.185 GB/sec
                                [    131072 bytes] CPU->GPU=   3.277 GB/sec, GPU->CPU   1.456 GB/sec
                                [    262144 bytes] CPU->GPU=   4.369 GB/sec, GPU->CPU   3.277 GB/sec
                                [    524288 bytes] CPU->GPU=   4.766 GB/sec, GPU->CPU   3.495 GB/sec
                                [   1048576 bytes] CPU->GPU=   4.993 GB/sec, GPU->CPU   3.383 GB/sec
                                [   2097152 bytes] CPU->GPU=   4.993 GB/sec, GPU->CPU   3.495 GB/sec
                                [   4194304 bytes] CPU->GPU=   5.115 GB/sec, GPU->CPU   3.525 GB/sec
                                [   8388608 bytes] CPU->GPU=   5.146 GB/sec, GPU->CPU   3.539 GB/sec
                                [  16777216 bytes] CPU->GPU=   5.162 GB/sec, GPU->CPU   3.466 GB/sec
                                [  33554432 bytes] CPU->GPU=   5.178 GB/sec, GPU->CPU   3.477 GB/sec
                                [  67108864 bytes] CPU->GPU=   5.186 GB/sec, GPU->CPU   3.470 GB/sec
                                [ 134217728 bytes] CPU->GPU=   5.188 GB/sec, GPU->CPU   3.457 GB/sec
                                [ 268435456 bytes] CPU->GPU=   5.191 GB/sec, GPU->CPU   3.458 GB/sec
                                [ 536870912 bytes] CPU->GPU=   5.191 GB/sec, GPU->CPU   3.459 GB/sec
                                calResAllocLocal2D() returned an error when trying to allocate 1073741824 bytes!
                                Peak CPU->GPU Bandwidth =   5.191 GB/sec [data size = 268435456 bytes]
                                Peak GPU->CPU Bandwidth =   3.539 GB/sec [data size = 8388608 bytes]

                                  • ATI Stream Power Toy - PCIeSpeedTest
                                    ryta1203

                                    The test won't complete, I get a "driver stopped responding" issue. I'm running Vista x64, couldn't find a VPU recover in CCC. Any ideas?

                                     

                                      • ATI Stream Power Toy - PCIeSpeedTest
                                        Firestrider

                                         

                                        Originally posted by: ryta1203 The test won't complete, I get a "driver stopped responding" issue. I'm running Vista x64, couldn't find a VPU recover in CCC. Any ideas?

                                         

                                        Yeah, a lot of people are having display driver crashes in Vista/7 but from the download page it looks like only Linux 64-bit and XP 32-bit are officially supported.

                                          • ATI Stream Power Toy - PCIeSpeedTest
                                            ryta1203

                                             

                                            Originally posted by: Firestrider
                                            Originally posted by: ryta1203 The test won't complete, I get a "driver stopped responding" issue. I'm running Vista x64, couldn't find a VPU recover in CCC. Any ideas?

                                             

                                            Yeah, a lot of people are having display driver crashes in Vista/7 but from the download page it looks like only Linux 64-bit and XP 32-bit are officially supported.

                                            You know silly me didn't even look at the OSes supported. Makes sense now... kinda.... although I'm unclear how useful a tool this is for Windows users, since most of us are using Vista/7.

                                            Re-looking at the FAQ I noticed where it mentioned the fix, the TdrLevel registry entry. I added it and the test completed.

                                            My results were ~4.5GB/s CPU->GPU and ~4.8GB/s GPU->CPU for the 1st GPU.

                                            My results were ~4GB/s CPU->GPU and ~5.1GB/s GPU->CPU for the 2nd GPU.

                                            I have almost the same setup as Michael:

                                            MSI K9A2 Plat

                                            Phenom 9850, 2.7GHz

                                            4GB 1066 DDR2 OCZ Plat

                                            Two 4850's in CFX with 512MB each.

                                            Vista Business x64.

                                              • ATI Stream Power Toy - PCIeSpeedTest
                                                prunedtree

                                                On Debian 5.0 64-bit, ASUS P6T, Intel i7 920, 12GB DDR3 1333, two HD4870X2

                                                Peak CPU->GPU Bandwidth =   4.793 GB/sec [data size = 16777216 bytes]
                                                Peak GPU->CPU Bandwidth =   2.185 GB/sec [data size = 65536 bytes]

                                                (same results for the 4 vpus)

                                                The CPU->GPU bandwidth is what the theory predicts (hwinfo --pci reports 128 byte max payload)
                                                However the GPU->CPU bandwith is quite low, and it gets worse as data size increases as jross reported on similar hardware (but with a single vpu).

                                                Nice tool, thanks.

                                                  • ATI Stream Power Toy - PCIeSpeedTest
                                                    bayoumi

                                                    on XP x64, Phenom 9500 2.2GHz, 790FX Chipset, 8GB DDR2-667 ECC, Gigabyte MA790FX-DQ6, Sapphire HD4887-X2/2GB, I get identical results on both device 0 & device 1 of the GPU as follows (I am surprised with the clock reading of the GPU?):

                                                    Devices found: 2

                                                    ===> Testing device 0 <===
                                                    Device type: RV770
                                                    Max resource 2D width/height: 8192/8192
                                                    Total GPU memory size: 1024 MB
                                                    Total CPU cached space size: 64 MB
                                                    Total CPU uncached space size: 2048 MB
                                                    GPU engine clock: 507 MHz
                                                    GPU memory clock: 500 MHz
                                                    Number of timing loops: 100
                                                    [        16 bytes] CPU->GPU= 586.570 KB/sec, GPU->CPU 680.845 KB/sec
                                                    [        32 bytes] CPU->GPU=   1.339 MB/sec, GPU->CPU   1.368 MB/sec
                                                    [        64 bytes] CPU->GPU=   2.728 MB/sec, GPU->CPU   2.451 MB/sec
                                                    [       128 bytes] CPU->GPU=   4.921 MB/sec, GPU->CPU   4.933 MB/sec
                                                    [       256 bytes] CPU->GPU=   9.726 MB/sec, GPU->CPU   9.714 MB/sec
                                                    [       512 bytes] CPU->GPU=  19.165 MB/sec, GPU->CPU  19.597 MB/sec
                                                    [      1024 bytes] CPU->GPU=  38.903 MB/sec, GPU->CPU  38.891 MB/sec
                                                    [      2048 bytes] CPU->GPU=  77.005 MB/sec, GPU->CPU  78.088 MB/sec
                                                    [      4096 bytes] CPU->GPU= 153.511 MB/sec, GPU->CPU 156.293 MB/sec
                                                    [      8192 bytes] CPU->GPU= 309.386 MB/sec, GPU->CPU 314.058 MB/sec
                                                    [     16384 bytes] CPU->GPU= 582.049 MB/sec, GPU->CPU 605.172 MB/sec
                                                    [     32768 bytes] CPU->GPU=   1.201 GB/sec, GPU->CPU   1.236 GB/sec
                                                    [     65536 bytes] CPU->GPU=   2.269 GB/sec, GPU->CPU   2.360 GB/sec
                                                    [    131072 bytes] CPU->GPU=   3.058 GB/sec, GPU->CPU   3.620 GB/sec
                                                    [    262144 bytes] CPU->GPU=   3.853 GB/sec, GPU->CPU   3.975 GB/sec
                                                    [    524288 bytes] CPU->GPU=   4.473 GB/sec, GPU->CPU   4.501 GB/sec
                                                    [   1048576 bytes] CPU->GPU=   4.772 GB/sec, GPU->CPU   4.788 GB/sec
                                                    [   2097152 bytes] CPU->GPU=   4.992 GB/sec, GPU->CPU   5.121 GB/sec
                                                    [   4194304 bytes] CPU->GPU=   5.103 GB/sec, GPU->CPU   5.345 GB/sec
                                                    [   8388608 bytes] CPU->GPU=   5.159 GB/sec, GPU->CPU   5.481 GB/sec
                                                    [  16777216 bytes] CPU->GPU=   5.191 GB/sec, GPU->CPU   4.849 GB/sec
                                                    [  33554432 bytes] CPU->GPU=   5.197 GB/sec, GPU->CPU   4.843 GB/sec
                                                    [  67108864 bytes] CPU->GPU=   5.202 GB/sec, GPU->CPU   4.862 GB/sec
                                                    [ 134217728 bytes] CPU->GPU=   5.208 GB/sec, GPU->CPU   4.817 GB/sec
                                                    [ 268435456 bytes] CPU->GPU=   5.209 GB/sec, GPU->CPU   4.816 GB/sec
                                                    [ 536870912 bytes] CPU->GPU=   5.208 GB/sec, GPU->CPU   4.825 GB/sec
                                                    Peak CPU->GPU Bandwidth =   5.209 GB/sec [data size = 268435456 bytes]
                                                    Peak GPU->CPU Bandwidth =   5.481 GB/sec [data size = 8388608 bytes]

                                                    ===> Testing device 1 <===
                                                    Device type: RV770
                                                    Max resource 2D width/height: 8192/8192
                                                    Total GPU memory size: 1024 MB
                                                    Total CPU cached space size: 64 MB
                                                    Total CPU uncached space size: 2048 MB
                                                    GPU engine clock: 507 MHz
                                                    GPU memory clock: 500 MHz
                                                    .....the same results

                                                • ATI Stream Power Toy - PCIeSpeedTest
                                                  attilagenc

                                                   

                                                  Originally posted by: ryta1203

                                                  You know silly me didn't even look at the OSes supported. Makes sense now... kinda.... although I'm unclear how useful a tool this is for Windows users, since most of us are using Vista/7.

                                                  Re-looking at the FAQ I noticed where it mentioned the fix, the TdrLevel registry entry. I added it and the test completed.

                                                  Which FAQ are you referring to ryta? I could not find any FAQ for PCIeSpeedTest?

                                                    • ATI Stream Power Toy - PCIeSpeedTest
                                                      ryta1203

                                                       

                                                      Originally posted by: attilagenc
                                                      Originally posted by: ryta1203

                                                      You know silly me didn't even look at the OSes supported. Makes sense now... kinda.... although I'm unclear how useful a tool this is for Windows users, since most of us are using Vista/7.

                                                      Re-looking at the FAQ I noticed where it mentioned the fix, the TdrLevel registry entry. I added it and the test completed.

                                                      Which FAQ are you referring to ryta? I could not find any FAQ for PCIeSpeedTest?

                                                      CAL FAQ that comes with documentation.

                                            • ATI Stream Power Toy - PCIeSpeedTest
                                              the729

                                              Hi everyone,

                                              It seems there are some problem with my box. The test result is:

                                              Peak CPU->GPU Bandwidth =   2.793 GB/sec [data size = 536870912 bytes]
                                              Peak GPU->CPU Bandwidth =   2.994 GB/sec [data size = 536870912 bytes]

                                              My hardware is:

                                              Gigabyte 790X-DS4, Phenom 9550, DDR2 800 2G*2, Sapphire HD 4870 1G

                                              I am running Kubuntu 8.10 64bit. The following is lspci -vv output, which I think is problematic, since it says "[58] Express (v2) Legacy Endpoint" and "LnkSta:    Speed 2.5GT/s,".

                                              Is there something wrong with the software or hardware?
                                              Could you using ubuntu x64 please post the lspci -vv output?

                                              =================================================

                                              01:00.0 VGA compatible controller: ATI Technologies Inc RV770 [Radeon HD 4870]
                                                  Subsystem: PC Partner Limited Device e850
                                                  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
                                                  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                                                  Latency: 0, Cache Line Size: 4 bytes
                                                  Interrupt: pin A routed to IRQ 2300
                                                  Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
                                                  Region 2: Memory at fdee0000 (64-bit, non-prefetchable) [size=64K]
                                                  Region 4: I/O ports at de00 [size=256]
                                                  [virtual] Expansion ROM at fde00000 [disabled] [size=128K]
                                                  Capabilities: [50] Power Management version 3
                                                      Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                                                      Status: D0 PME-Enable- DSel=0 DScale=0 PME-
                                                  Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
                                                      DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                                                          ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                                                      DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                                                          RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                                                          MaxPayload 128 bytes, MaxReadReq 128 bytes
                                                      DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                                                      LnkCap:    Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
                                                          ClockPM- Suprise- LLActRep- BwNot-
                                                      LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                                                          ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                                                      LnkSta:    Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                                                  Capabilities: [a0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
                                                      Address: 00000000fee0f00c  Data: 4191
                                                  Capabilities: [100] Vendor Specific Information <?>
                                                  Kernel driver in use: fglrx_pci
                                                  Kernel modules: fglrx

                                                • ATI Stream Power Toy - PCIeSpeedTest
                                                  indi123

                                                  On Debian 5.0 64-bit, ASUS P6T, Intel i7 920, 12GB DDR3 1333, two HD4870X2

                                                  Peak CPU->GPU Bandwidth =   4.793 GB/sec [data size = 16777216 bytes]
                                                  Peak GPU->CPU Bandwidth =   2.185 GB/sec [data size = 65536 bytes]

                                                  (same results for the 4 vpus)

                                                  The CPU->GPU bandwidth is what the theory predicts (hwinfo --pci reports 128 byte max payload)
                                                  However the GPU->CPU bandwith is quite low, and it gets worse as data size increases as jross reported on similar hardware (but with a single vpu).

                                                  Nice tool, thanks.

                                                   

                                                  • ATI Stream Power Toy - PCIeSpeedTest
                                                    zpdixon

                                                     

                                                    Originally posted by: the729

                                                    I am running Kubuntu 8.10 64bit. The following is lspci -vv output, which I think is problematic, since it says "[58] Express (v2) Legacy Endpoint" and "LnkSta:    Speed 2.5GT/s,".

                                                     

                                                    Your mobo supports PCI-E 1.0 only (2.5GT/s). So assuming a Max_Payload_Size of 256 bytes (I guess) you should see 4.0 GB/s * 70% = 2.8GB/s of throughput... which is exactly what you measure.

                                                      • ATI Stream Power Toy - PCIeSpeedTest
                                                        the729

                                                         

                                                        Originally posted by: zpdixon

                                                         

                                                        Your mobo supports PCI-E 1.0 only (2.5GT/s).

                                                         

                                                        The mobo is GA-790X-DS4. It supports PCIe 2.0, according to the page :

                                                        http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2695

                                                        Does that mean this is something to do with the driver or software?

                                                        Is there a quick way to check those PCIe parameters in Windows (not by running this power toy)?

                                                          • ATI Stream Power Toy - PCIeSpeedTest
                                                            zpdixon

                                                             

                                                            Originally posted by: the729

                                                             

                                                            The mobo is GA-790X-DS4. It supports PCIe 2.0, according to the page :

                                                             

                                                            http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2695

                                                             

                                                            Does that mean this is something to do with the driver or software?

                                                             

                                                            Is there a quick way to check those PCIe parameters in Windows (not by running this power toy)?

                                                             

                                                            Well it may support 5.0GT/s but it negotiated only 2.5GT/s with your Sapphire 4870. It could be because your sapphire only supports 2.5GT/s (not sure...) or because of EMI interferences or random hardware problems, etc.

                                                              • ATI Stream Power Toy - PCIeSpeedTest
                                                                johnnyb

                                                                Hi,

                                                                I have interesting results:

                                                                PCIeSpeedTest

                                                                2.5GiB/s CPU->GPU

                                                                785MiB/s GPU->CPU

                                                                The GPU is on par with CPU until 16384 bytes are copied, after this point CPU gets a big boost. When it reaches the 256 MiB, it crashes or has a huge performance drop (437MiB/s CPU->GPU, 668 MiB/s GPU->CPU).

                                                                System: Intel Q6600@2.4GHz, 8GiB, HD4870 512MiB, Vista Business 64bit, Catalyst 9.3, ATi Stream 1.4.

                                                                  • ATI Stream Power Toy - PCIeSpeedTest
                                                                    bjm

                                                                    Hi,

                                                                    Thanks for the tool!

                                                                    It caused VPU recover to kick in..most of the time..on my machine, but when it ran the whole way through  I got peaks of:

                                                                    5.54GB/sec @ 2097152 bytes CPU->GPU

                                                                    1.09GB/sec @ 134217728 bytes GPU->CPU

                                                                    Both directions are similar performance till they both reach ~1GB/sec then GPU->CPU stops increasing.

                                                                    System: HD4890 1GB, Phenom II 720 @ 2.8, GA-MA790XT-UD4P, 4GB DDR3, Catalyst 9.4, XP SP3 32bit

                                                              • ATI Stream Power Toy - PCIeSpeedTest
                                                                meestangna

                                                                I have a problem though. It locks up my computer and crashes my display driver when the test reaches 67108864 bytes. I'm not too concerned since I'm running Windows 7 x64 beta build 7000 with the WDDM 1.1 beta display driver but I would just like to relay the information.

                                                                  • ATI Stream Power Toy - PCIeSpeedTest
                                                                    meestangna

                                                                    The GPU is on par with CPU until 16384 bytes are copied, after this point CPU gets a big boost. When it reaches the 256 MiB, it crashes or has a huge performance drop (437MiB/s CPU->GPU, 668 MiB/s GPU->CPU).

                                                                      • ATI Stream Power Toy - PCIeSpeedTest
                                                                        bjm

                                                                         

                                                                        Now have tested using Win Server 2008r2 64-bit,  GPU->CPU has improved from Win XP: 

                                                                        XP SP3 32-bit:

                                                                        5.54GB/sec @ 2097152 bytes CPU->GPU

                                                                        1.09GB/sec @ 134217728 bytes GPU->CPU

                                                                         

                                                                        2008r2 64-bit:

                                                                        5.33GB/sec @ 268435456 bytes CPU->GPU

                                                                        5.52BG/sec @ 8388608 bytes GPU->CPU

                                                                         

                                                                        Same system used in both:

                                                                         HD4890 1GB, Phenom II 720 @ 2.8, GA-MA790XT-UD4P, 4GB DDR3



                                                                          • ATI Stream Power Toy - PCIeSpeedTest
                                                                            wuttz

                                                                            http://i664.photobucket.com/albums/vv4/wuttzi/CCIMG700.png

                                                                             

                                                                            mine crashes.

                                                                             

                                                                            system specs

                                                                            phenom ii 955be, gigabyte ma785gmt-ud2h, ocz3 platinum 2x2gb ddr3-1333 cas6, ati radeon 4870x2, wd caviar green 500gb/32mb, pc power & cooling 750w, win7x64 professional

                                                                              • http://developer.amd.com/GPU/ATISTREAMPOWERTOY/Pages/default.aspx
                                                                                MrSandman

                                                                                Hi all

                                                                                Win xp sp3

                                                                                Motherboard gigabyte ga-ma790fxt-ud5p 

                                                                                Processor amd phenom II x3 720BE@3712Mhz 

                                                                                4GB RAM ddr3 1666

                                                                                 

                                                                                 

                                                                                ===> Testing device 0 <===

                                                                                Device type: RV730

                                                                                Max resource 2D width/height: 8192/8192

                                                                                Total GPU memory size: 512 MB

                                                                                Total CPU cached space size: 64 MB

                                                                                Total CPU uncached space size: 512 MB

                                                                                GPU engine clock: 800 MHz

                                                                                GPU memory clock: 1125 MHz

                                                                                Number of timing loops: 100

                                                                                [        16 bytes] CPU->GPU= 697.646 KB/sec, GPU->CPU= 715.282 KB/sec

                                                                                [        32 bytes] CPU->GPU=   1.434 MB/sec, GPU->CPU=   2.003 MB/sec

                                                                                [        64 bytes] CPU->GPU=   2.300 MB/sec, GPU->CPU=   2.352 MB/sec

                                                                                [       128 bytes] CPU->GPU=   4.604 MB/sec, GPU->CPU=   4.614 MB/sec

                                                                                [       256 bytes] CPU->GPU=   9.126 MB/sec, GPU->CPU=   8.963 MB/sec

                                                                                [       512 bytes] CPU->GPU=  17.729 MB/sec, GPU->CPU=  18.378 MB/sec

                                                                                [      1024 bytes] CPU->GPU=  35.719 MB/sec, GPU->CPU=  37.482 MB/sec

                                                                                [      2048 bytes] CPU->GPU=  70.117 MB/sec, GPU->CPU=  71.767 MB/sec

                                                                                [      4096 bytes] CPU->GPU= 144.484 MB/sec, GPU->CPU= 141.988 MB/sec

                                                                                [      8192 bytes] CPU->GPU= 279.116 MB/sec, GPU->CPU= 266.716 MB/sec

                                                                                [     16384 bytes] CPU->GPU= 583.427 MB/sec, GPU->CPU= 544.476 MB/sec

                                                                                [     32768 bytes] CPU->GPU=   1.009 GB/sec, GPU->CPU=   1.206 GB/sec

                                                                                [     65536 bytes] CPU->GPU=   1.154 GB/sec, GPU->CPU=   1.345 GB/sec

                                                                                [    131072 bytes] CPU->GPU=   1.269 GB/sec, GPU->CPU=   1.457 GB/sec

                                                                                [    262144 bytes] CPU->GPU=   1.324 GB/sec, GPU->CPU=   1.509 GB/sec

                                                                                [    524288 bytes] CPU->GPU=   1.352 GB/sec, GPU->CPU=   1.544 GB/sec

                                                                                [   1048576 bytes] CPU->GPU=   1.367 GB/sec, GPU->CPU=   1.556 GB/sec

                                                                                [   2097152 bytes] CPU->GPU=   1.375 GB/sec, GPU->CPU=   1.546 GB/sec

                                                                                [   4194304 bytes] CPU->GPU=   1.378 GB/sec, GPU->CPU=   1.530 GB/sec

                                                                                [   8388608 bytes] CPU->GPU=   1.380 GB/sec, GPU->CPU=   1.525 GB/sec

                                                                                [  16777216 bytes] CPU->GPU=   1.381 GB/sec, GPU->CPU=   1.519 GB/sec

                                                                                [  33554432 bytes] CPU->GPU=   1.382 GB/sec, GPU->CPU=   1.531 GB/sec

                                                                                [  67108864 bytes] CPU->GPU=   1.382 GB/sec, GPU->CPU=   1.528 GB/sec

                                                                                [ 134217728 bytes] CPU->GPU=   1.382 GB/sec, GPU->CPU=   1.528 GB/sec

                                                                                [ 268435456 bytes] CPU->GPU=   1.382 GB/sec, GPU->CPU=   1.526 GB/sec

                                                                                calResAllocLocal2D() returned an error when trying to allocate 536870912 bytes!

                                                                                calResAllocRemote2D() returned an error when trying to allocate 536870912 bytes

                                                                                (uncached)!

                                                                                Peak CPU->GPU Bandwidth =   1.382 GB/sec [data size = 268435456 bytes]

                                                                                Peak GPU->CPU Bandwidth =   1.556 GB/sec [data size = 1048576 bytes]



                                                                                  • http://developer.amd.com/GPU/ATISTREAMPOWERTOY/Pages/default.aspx
                                                                                    jhoffmann

                                                                                    Hi,

                                                                                    i've also performance problems with an ATI HD5870, Core i7 950, Asus P6T SE (x58) and 12gb RAM:

                                                                                    ===> Testing device 0 <===
                                                                                    Device type: Unknown
                                                                                    Max resource 2D width/height: 16384/16384
                                                                                    Total GPU memory size: 1024 MB
                                                                                    Total CPU cached space size: 508 MB
                                                                                    Total CPU uncached space size: 1279 MB
                                                                                    GPU engine clock: 900 MHz
                                                                                    GPU memory clock: 1300 MHz
                                                                                    Number of timing loops: 100
                                                                                    [        16 bytes] CPU->GPU= 800.000 KB/sec, GPU->CPU=   1.600 MB/sec
                                                                                    [        32 bytes] CPU->GPU= 457.143 KB/sec, GPU->CPU=   3.200 MB/sec
                                                                                    [        64 bytes] CPU->GPU=   1.600 MB/sec, GPU->CPU=   2.133 MB/sec
                                                                                    [       128 bytes] CPU->GPU=   3.200 MB/sec, GPU->CPU=   4.267 MB/sec
                                                                                    [       256 bytes] CPU->GPU=   8.533 MB/sec, GPU->CPU=   8.533 MB/sec
                                                                                    [       512 bytes] CPU->GPU=  12.800 MB/sec, GPU->CPU=   7.314 MB/sec
                                                                                    [      1024 bytes] CPU->GPU=  34.133 MB/sec, GPU->CPU=  34.133 MB/sec
                                                                                    [      2048 bytes] CPU->GPU=  68.267 MB/sec, GPU->CPU=  68.267 MB/sec
                                                                                    [      4096 bytes] CPU->GPU= 136.533 MB/sec, GPU->CPU= 204.800 MB/sec
                                                                                    [      8192 bytes] CPU->GPU= 273.067 MB/sec, GPU->CPU= 273.067 MB/sec
                                                                                    [     16384 bytes] CPU->GPU= 546.133 MB/sec, GPU->CPU= 546.133 MB/sec
                                                                                    [     32768 bytes] CPU->GPU=   1.092 GB/sec, GPU->CPU= 655.360 MB/sec
                                                                                    [     65536 bytes] CPU->GPU=   2.185 GB/sec, GPU->CPU= 595.782 MB/sec
                                                                                    [    131072 bytes] CPU->GPU=   3.277 GB/sec, GPU->CPU= 504.123 MB/sec
                                                                                    [    262144 bytes] CPU->GPU=   3.745 GB/sec, GPU->CPU= 468.114 MB/sec
                                                                                    [    524288 bytes] CPU->GPU=   4.033 GB/sec, GPU->CPU= 468.114 MB/sec
                                                                                    [   1048576 bytes] CPU->GPU=   4.194 GB/sec, GPU->CPU= 457.893 MB/sec
                                                                                    [   2097152 bytes] CPU->GPU=   4.194 GB/sec, GPU->CPU= 449.069 MB/sec
                                                                                    [   4194304 bytes] CPU->GPU=   4.280 GB/sec, GPU->CPU= 443.373 MB/sec
                                                                                    [   8388608 bytes] CPU->GPU=   4.215 GB/sec, GPU->CPU= 441.273 MB/sec
                                                                                    [  16777216 bytes] CPU->GPU=   4.226 GB/sec, GPU->CPU= 442.437 MB/sec
                                                                                    [  33554432 bytes] CPU->GPU=   4.067 GB/sec, GPU->CPU= 450.395 MB/sec
                                                                                    [  67108864 bytes] CPU->GPU=   4.067 GB/sec, GPU->CPU= 461.420 MB/sec
                                                                                    [ 134217728 bytes] CPU->GPU=   4.091 GB/sec, GPU->CPU= 480.207 MB/sec
                                                                                    [ 268435456 bytes] CPU->GPU=   4.123 GB/sec, GPU->CPU= 492.841 MB/sec
                                                                                    calResAllocLocal2D() returned an error when trying to allocate 536870912 bytes!
                                                                                    Peak CPU->GPU Bandwidth =   4.280 GB/sec [data size = 4194304 bytes]
                                                                                    Peak GPU->CPU Bandwidth = 655.360 MB/sec [data size = 32768 bytes]

                                                                                     

                                                                                    Also look here: http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=130923&enterthread=y

                                                                          • ATI Stream Power Toy - PCIeSpeedTest
                                                                            Lancer786

                                                                            Thank you very much for this one, really helpful for me.

                                                                            Regards