45 Replies Latest reply on Feb 17, 2011 9:34 PM by darkradeon

    PCIe Performance Problem with HD5870

    jhoffmann
      Catalyst 10.3, Linux 2.6.31.12 (x86_64)

      Hi guys,

      we have measured a PCIe performance impact executing CPU->GPU and, even harder, GPU->CPU transfers. The impacts was found with ATI's PCIeSpeedTest PowerToy (cal), with NVidias OCLBandwith test (opencl) and with our own benchmark (opencl). See below.

      We think that this is a driver bug, because the hardware link is set up properly to PCIe 16x, 5GT/s (checked with lspci -vv).
      Maybe someone has an idea how we can fix this?

      Regards
      Joern Hoffmann
      University of Leipzig
      Computer Engineering Group


      Hardware: 20 PCs each with a HD5870, Core i7 950, 12GB DDR running on a Asus P6T SE board.
      Software: OpenSuse 11.2, Linux 2.6.31.12, glibc-2.1, Xorg 7.4-35.3, Xserver 1.6.5
      Driver  : fglrx 8.712(10.3), also testet: 8.712.3.1 (10.3 OGL4 preview)


      Measure (1): PCIe SpeedTest v0.2 on HD5870
      ------------
      Peak CPU->GPU Bandwidth =   4.324 GB/sec [data size = 4194304 bytes]
      Peak GPU->CPU Bandwidth = 655.360 MB/sec [data size = 32768 bytes]

      -> Arghhh, peak at 650 MB/sec!


      Measure (2a): oclBandWidthTest on HD5870
      -------------
       Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
         Transfer Size (Bytes)    Bandwidth(MB/s)
         33554432            1503.7

       Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
         Transfer Size (Bytes)    Bandwidth(MB/s)
         33554432            1042.7

       Device to Device Bandwidth, 1 Device(s)
         Transfer Size (Bytes)    Bandwidth(MB/s)
         33554432            106887.6


      Measure (2b): oclBandWidthTest on NVidia 9800GT
      -------------
      Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
         Transfer Size (Bytes)    Bandwidth(MB/s)
         33554432            2280.9

       Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
         Transfer Size (Bytes)    Bandwidth(MB/s)
         33554432            1723.5

       Device to Device Bandwidth, 1 Device(s)
         Transfer Size (Bytes)    Bandwidth(MB/s)
         33554432            49929.0


      Measure (3a): transfer of 8192 float numbers (32kb) on HD5870
      -------------
      OpenCL buffer transfer time
        submission-to-start  : 440529 ns
        execution time       :  29420 ns

      Measure (3b): transfer of 8192 float numbers (32kb) on NVidia 9800GT
      -------------
      OpenCL buffer transfer time
        submission-to-start  :  44608 ns
        execution time       :  15712 ns


      lspci -vv:
      ----------
      00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 13) (prog-if 00 [Normal decode])
          Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
          Latency: 0, Cache Line Size: 256 bytes
          Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
          I/O behind bridge: 0000b000-0000bfff
          Memory behind bridge: fbb00000-fbbfffff
          Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
          Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
          BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
              PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
          Capabilities: [40] Subsystem: ASUSTeK Computer Inc. Device 836b
          Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit-
              Address: fee002b8  Data: 0000
              Masking: 00000003  Pending: 00000000
          Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00
              DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                  ExtTag+ RBE+ FLReset-
              DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                  RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                  MaxPayload 128 bytes, MaxReadReq 128 bytes
              DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
              LnkCap:    Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                  ClockPM- Surprise+ LLActRep+ BwNot+
              LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                  ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
              LnkSta:    Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
              SltCap:    AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
                  Slot #  2, PowerLimit 75.000000; Interlock- NoCompl-
              SltCtl:    Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                  Control: AttnInd Off, PwrInd Off, Power- Interlock-
              SltSta:    Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                  Changed: MRL- PresDet+ LinkState+
              RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
              RootCap: CRSVisible-
              RootSta: PME ReqID 0000, PMEStatus- PMEPending-
              DevCap2: Completion Timeout: Range BCD, TimeoutDis+ ARIFwd+
              DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis- ARIFwd-
              LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                   Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                   Compliance De-emphasis: -6dB
              LnkSta2: Current De-emphasis Level: -6dB
          Capabilities: [e0] Power Management version 3
              Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
              Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
          Capabilities: [100] Advanced Error Reporting
              UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
              UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
              UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
              CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
              CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
              AERCap:    First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
          Capabilities: [150] Access Control Services
              ACSCap:    SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
              ACSCtl:    SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
          Capabilities: [160] Vendor Specific Information <?>
          Kernel driver in use: pcieport-driver


      02:00.0 VGA compatible controller: ATI Technologies Inc Device 6898 (prog-if 00 [VGA controller])
          Subsystem: ATI Technologies Inc Device 0b00
          Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
          Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
          Latency: 0, Cache Line Size: 256 bytes
          Interrupt: pin A routed to IRQ 59
          Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
          Region 2: Memory at fbbc0000 (64-bit, non-prefetchable) [size=128K]
          Region 4: I/O ports at b000 [size=256]
          Expansion ROM at fbba0000 [disabled] [size=128K]
          Capabilities: [50] Power Management version 3
              Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
              Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
          Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
              DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                  ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
              DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                  RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                  MaxPayload 128 bytes, MaxReadReq 512 bytes
              DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
              LnkCap:    Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
                  ClockPM- Surprise- LLActRep- BwNot-
              LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                  ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
              LnkSta:    Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
              DevCap2: Completion Timeout: Not Supported, TimeoutDis-
              DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
              LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                   Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                   Compliance De-emphasis: -6dB
              LnkSta2: Current De-emphasis Level: -6dB
          Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
              Address: 00000000fee00498  Data: 0000
          Capabilities: [100] Vendor Specific Information <?>
          Capabilities: [150] Advanced Error Reporting
              UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
              UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
              UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
              CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
              CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
              AERCap:    First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
          Kernel driver in use: fglrx_pci

      ===> Testing device 0 <=== Device type: Unknown Max resource 2D width/height: 16384/16384 Total GPU memory size: 1024 MB Total CPU cached space size: 508 MB Total CPU uncached space size: 1279 MB GPU engine clock: 0 MHz GPU memory clock: 0 MHz Number of timing loops: 100 [ 16 bytes] CPU->GPU= 1.600 MB/sec, GPU->CPU= 800.000 KB/sec [ 32 bytes] CPU->GPU= 1.600 MB/sec, GPU->CPU= 533.333 KB/sec [ 64 bytes] CPU->GPU= 2.133 MB/sec, GPU->CPU= 2.133 MB/sec [ 128 bytes] CPU->GPU= 4.267 MB/sec, GPU->CPU= 4.267 MB/sec [ 256 bytes] CPU->GPU= 8.533 MB/sec, GPU->CPU= 8.533 MB/sec [ 512 bytes] CPU->GPU= 17.067 MB/sec, GPU->CPU= 25.600 MB/sec [ 1024 bytes] CPU->GPU= 17.067 MB/sec, GPU->CPU= 34.133 MB/sec [ 2048 bytes] CPU->GPU= 68.267 MB/sec, GPU->CPU= 68.267 MB/sec [ 4096 bytes] CPU->GPU= 136.533 MB/sec, GPU->CPU= 204.800 MB/sec [ 8192 bytes] CPU->GPU= 409.600 MB/sec, GPU->CPU= 273.067 MB/sec [ 16384 bytes] CPU->GPU= 546.133 MB/sec, GPU->CPU= 546.133 MB/sec [ 32768 bytes] CPU->GPU= 1.092 GB/sec, GPU->CPU= 655.360 MB/sec [ 65536 bytes] CPU->GPU= 2.185 GB/sec, GPU->CPU= 595.782 MB/sec [ 131072 bytes] CPU->GPU= 3.277 GB/sec, GPU->CPU= 524.288 MB/sec [ 262144 bytes] CPU->GPU= 3.277 GB/sec, GPU->CPU= 485.452 MB/sec [ 524288 bytes] CPU->GPU= 3.745 GB/sec, GPU->CPU= 472.332 MB/sec [ 1048576 bytes] CPU->GPU= 4.194 GB/sec, GPU->CPU= 459.902 MB/sec [ 2097152 bytes] CPU->GPU= 4.280 GB/sec, GPU->CPU= 449.069 MB/sec [ 4194304 bytes] CPU->GPU= 4.324 GB/sec, GPU->CPU= 442.904 MB/sec [ 8388608 bytes] CPU->GPU= 4.280 GB/sec, GPU->CPU= 438.964 MB/sec [ 16777216 bytes] CPU->GPU= 4.258 GB/sec, GPU->CPU= 437.476 MB/sec [ 33554432 bytes] CPU->GPU= 4.052 GB/sec, GPU->CPU= 443.607 MB/sec [ 67108864 bytes] CPU->GPU= 4.090 GB/sec, GPU->CPU= 452.826 MB/sec [ 134217728 bytes] CPU->GPU= 4.108 GB/sec, GPU->CPU= 468.212 MB/sec [ 268435456 bytes] CPU->GPU= 4.136 GB/sec, GPU->CPU= 492.307 MB/sec [ 536870912 bytes] CPU->GPU= 4.211 GB/sec, GPU->CPU= 496.065 MB/sec calResAllocLocal2D() returned an error when trying to allocate 1073741824 bytes! Peak CPU->GPU Bandwidth = 4.324 GB/sec [data size = 4194304 bytes] Peak GPU->CPU Bandwidth = 655.360 MB/sec [data size = 32768 bytes]

        • PCIe Performance Problem with HD5870
          xero

          Hi Joern,

          I got the similar PCI speed test results. The GPU->CPU is very slow.

          (CPU: intel i5 750,  MB: intel P55, GPU: HD5870, OS: Linux 2.6.18 i386, Driver: 10.2)

          Do you have any progress on this?

           

           

            • PCIe Performance Problem with HD5870
              jhoffmann

               xero,

              yes we made a (negative) progress. Now we've testet the PCIe transfer rates under Windows 7 x86_64 with the PCIe SpeedTest v0.2 and also with Sissoft Sandra. Unfortunatly the same results.

              CPU-> GPU is about 4GB/s
              GPU-> CPU is about 450 MB/s

              We also checked the machine against different other benchmarks(3DMark, Sisoftsandra, Unigine Heaven). The measures like the host memory bandwith, the GPU perfomance, CPU speed etc. look quiet well. Only the PCIe transfer rates, especially form the GPU-to-CPU is around 6% of it's theorethical maximum.

               

              To say it explicitly: we use the recent drivers from ATI (10.3) and Intel (chipset autoinst 911) . Also we've flashed the BIOS to latest version( v808) .

              Furthermore I played around with the BIOS configs, eg. diabled the sleep states, manually configured the memory, manually set the QPI-Interface etc.pp. - no change at all...

              Thus, we now suspect a chipset bug within the Intel x58 northbridge rathen than a ATI driver bug. I've read similar transfer centric issues around the x58 chipset on a german site: http://www.planet3dnow.de/vbulletin/showthread.php?t=364174

              We will clear this issue on monday because we want to replace the HD5870 by an GTX 270... Stay tuned...

              Joern

               

                • PCIe Performance Problem with HD5870
                  xero

                  Hi Joern,

                  Thanks for the information.

                  I tried to install a HD4870 on the P55 mainboard. The result is as slow as the 5870.

                  I also intalled the 5870 on a P45 mainboard. The CPU->GPU/GPU->CPU speed can reach ~5GB/s.

                   

                    • PCIe Performance Problem with HD5870
                      jhoffmann

                      Hi xero,

                      I can confirm good results with a HD5850 on a P43 board / Core2Quad Q6700. The transfer bandwith reaches at peak:

                      CPU -> GPU : ~5 GB/s
                      GPU -> CPU : ~6 GB/s

                      So there will be two solutions left, either there is a ATI driver issue related to the x58 or the problem is inside the X58 chip.

                      I'm curious about the test on monday the 04-12-2010 with the Nvidia card...

                      Thank you...

                       

                       

                       

                        • PCIe Performance Problem with HD5870
                          Tzupy

                          Hi,

                          I am also interested in high GPU -> CPU bandwidth, for off-screen rendering of large images.

                          There seems to be an issue with the X58 chipset and ATI cards, especially the new 5850 / 5870 ones, severely limiting the readback bandwidth.

                          For comparison, my 4850 with 1GB on i7-920 with 6 GB DDR3-1066C7 and Vista64 HP gets about 1.2 GB/s maximum readback and for larger blocks only 950 MB/s, of course tested with PCIe Speed Test (with glReadPixels I get lower values).

                          Even with the latest v4.0 beta drivers the problem hasn't been solved for radeons, but it *may* have been solved for FirePro cards. It's about using hardware DMA, there seems to be a problem to get it working on X58 systems. Of course, if the problem was solved for FirePro cards, there shouldn't be any *technical* reason for it to be left unsolved for Radeons.

                          I raised this isue in the OpenGL forums, you can have a look at this thread: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=275081#Post275081

                           

                            • PCIe Performance Problem with HD5870
                              huafeihua116

                              Thanks for sharing shis info>

                                • PCIe Performance Problem with HD5870
                                  jhoffmann

                                  Hi xero, Tzupy,

                                  we now have tested the system under windows 7 x86_64 on the x58 board with a NVidia GTX 275. The problems are gone.

                                  The GTX reaches under Sissoft Sandra (using OpenCL):

                                  CPU -> GPU : 5.57 GB/s
                                  GPU -> CPU : 5.27 GB/s

                                  With the HD5870 we measured with Sandra (using OpenCL, Stream and Direct Compute) and the PCIe SpeedTest (ATI Stream):

                                  CPU-> GPU : ~ 4GB/s
                                  GPU-> CPU : ~ 450 MB/s

                                  So we now can say we hit a driver bug related to the x58 because the problem doesn't show up under a p45 chipset or an AMD mainboard.

                                  Maybe the problem is a regession because there was a problem with x58 mainboards in 11/2008: (ATI Catalyst x58 Hotfix http://ht4u.net/news/2655_neue_geforce-treiber_von_nvidia_und_ati-catalyst-hotfix_fuer_x58/

                                  Now it's time for ATI to react.

                                  Joern

                                   

                                   

                                    • PCIe Performance Problem with HD5870
                                      xero

                                      thx Joern.

                                      Have you send a message to AMD yet?

                                        • PCIe Performance Problem with HD5870
                                          jhoffmann

                                          Hi xero,

                                          no. The reason is I dotn't know how. Is there a bug track system or a hotline?

                                            • PCIe Performance Problem with HD5870
                                              Tzupy

                                              Hi,

                                              Reporting this could be done by sending a pm to an AMD employee, but I believe they already know about it, considering that a post with X58 low readback speed was made a year ago.

                                              The problem doesn't seem to affect only Intel X58 / P55 chipsets, but also some AMD-only configurations, but to a lesser extent. Probably lower readback can happen because of various chipset / BIOS / driver / OS interactions that prevent the PCIe handshaking to deliver the highest possible readback bandwidth.

                                              So I guess the X58 with 5850 / 5870 is a worst case scenario. And I wouldn't bet on it being fixed soon for Radeons. After all, AMD wants to sell the new 1,500+ euros Firepro 8800.

                                              My 4850 when mounted in my backup computer, an X2 5050e (2.6 GHz) with 4GB DDR2-800C5 on 785G mobo and XP, gets about 3.24 GB/s upload ( 4.4 GB/s on X58 ) and 2.31 GB/s readback, for large blocks in PCIe Speed Test. With a 4670 I get 3.28 GB/s upload and 2.87 GB/s readback.

                                               

                            • PCIe Performance Problem with HD5870
                              MicahVillmow
                              To report any issues to AMD, please send an email to streamdeveloper@amd.com, please do not private message AMD employee's.
                                • PCIe Performance Problem with HD5870
                                  jhoffmann

                                  Hi,

                                  we've reported the problem to AMD and, over the pc manufacturer also to Intel. Maybe there will be a solution in the next Catalyst package.

                                  Otherwise the card is quite useless for us...

                                  joern

                                    • PCIe Performance Problem with HD5870
                                      odlan

                                      I have a similar problem of poor performance to readback from GPU > CPU

                                      Ubuntu 9.10 64bit - ATI driver 10.3 - INTEL i7 975Extreme - SAPPHIRE ATI HD 5970 OC - Mothrboard EVGA 4-WAY-SLI - 6 GB RAM corsair cmg6gx3m3a2000c8

                                      PCIeSpeedTest_v0.2/PCIeSpeedTest' -tdf pcietest1
                                      Devices found: 2

                                      ===> Testing device 0 <===
                                      Device type: Unknown
                                      Max resource 2D width/height: 16384/16384
                                      Total GPU memory size: 1024 MB
                                      Total CPU cached space size: 508 MB
                                      Total CPU uncached space size: 1279 MB
                                      GPU engine clock: 1000 MHz
                                      GPU memory clock: 1500 MHz
                                      Number of timing loops: 100
                                      [        16 bytes] CPU->GPU= 800.000 KB/sec, GPU->CPU= 800.000 KB/sec
                                      [        32 bytes] CPU->GPU=   1.600 MB/sec, GPU->CPU=   3.200 MB/sec
                                      [        64 bytes] CPU->GPU= 914.286 KB/sec, GPU->CPU=   1.280 MB/sec
                                      [       128 bytes] CPU->GPU=   4.267 MB/sec, GPU->CPU=   6.400 MB/sec
                                      [       256 bytes] CPU->GPU=  12.800 MB/sec, GPU->CPU=  12.800 MB/sec
                                      [       512 bytes] CPU->GPU=  25.600 MB/sec, GPU->CPU=  25.600 MB/sec
                                      [      1024 bytes] CPU->GPU=  51.200 MB/sec, GPU->CPU=  51.200 MB/sec
                                      [      2048 bytes] CPU->GPU= 102.400 MB/sec, GPU->CPU=  34.133 MB/sec
                                      [      4096 bytes] CPU->GPU= 204.800 MB/sec, GPU->CPU= 204.800 MB/sec
                                      [      8192 bytes] CPU->GPU= 409.600 MB/sec, GPU->CPU= 409.600 MB/sec
                                      [     16384 bytes] CPU->GPU= 819.200 MB/sec, GPU->CPU= 819.200 MB/sec
                                      [     32768 bytes] CPU->GPU=   1.638 GB/sec, GPU->CPU=   1.638 GB/sec
                                      [     65536 bytes] CPU->GPU=   2.185 GB/sec, GPU->CPU=   1.638 GB/sec
                                      [    131072 bytes] CPU->GPU=   3.277 GB/sec, GPU->CPU= 569.878 MB/sec
                                      [    262144 bytes] CPU->GPU=   3.745 GB/sec, GPU->CPU= 689.853 MB/sec
                                      [    524288 bytes] CPU->GPU=   4.033 GB/sec, GPU->CPU= 873.813 MB/sec
                                      [   1048576 bytes] CPU->GPU=   4.559 GB/sec, GPU->CPU= 852.501 MB/sec
                                      [   2097152 bytes] CPU->GPU=   4.766 GB/sec, GPU->CPU= 803.507 MB/sec
                                      [   4194304 bytes] CPU->GPU=   4.821 GB/sec, GPU->CPU= 824.028 MB/sec
                                      [   8388608 bytes] CPU->GPU=   4.906 GB/sec, GPU->CPU= 820.803 MB/sec
                                      [  16777216 bytes] CPU->GPU=   4.949 GB/sec, GPU->CPU= 819.200 MB/sec
                                      [  33554432 bytes] CPU->GPU=   4.964 GB/sec, GPU->CPU= 815.418 MB/sec
                                      [  67108864 bytes] CPU->GPU=   4.975 GB/sec, GPU->CPU= 815.517 MB/sec
                                      [ 134217728 bytes] CPU->GPU=   4.977 GB/sec, GPU->CPU= 812.161 MB/sec
                                      [ 268435456 bytes] CPU->GPU=   4.980 GB/sec, GPU->CPU= 810.004 MB/sec
                                      [ 536870912 bytes] CPU->GPU=   4.981 GB/sec, GPU->CPU= 810.787 MB/sec
                                      calResAllocLocal2D() returned an error when trying to allocate 1073741824 bytes!
                                      Peak CPU->GPU Bandwidth =   4.981 GB/sec [data size = 536870912 bytes]
                                      Peak GPU->CPU Bandwidth =   1.638 GB/sec [data size = 32768 bytes]

                                      ===> Testing device 1 <===
                                      Device type: Unknown
                                      Max resource 2D width/height: 16384/16384
                                      Total GPU memory size: 1024 MB
                                      Total CPU cached space size: 508 MB
                                      Total CPU uncached space size: 1279 MB
                                      GPU engine clock: 1000 MHz
                                      GPU memory clock: 1500 MHz
                                      Number of timing loops: 100
                                      [        16 bytes] CPU->GPU= 800.000 KB/sec, GPU->CPU= 800.000 KB/sec
                                      [        32 bytes] CPU->GPU=   1.067 MB/sec, GPU->CPU= 457.143 KB/sec
                                      [        64 bytes] CPU->GPU=   2.133 MB/sec, GPU->CPU=   3.200 MB/sec
                                      [       128 bytes] CPU->GPU= 984.615 KB/sec, GPU->CPU=   6.400 MB/sec
                                      [       256 bytes] CPU->GPU=   3.657 MB/sec, GPU->CPU=   8.533 MB/sec
                                      [       512 bytes] CPU->GPU=  25.600 MB/sec, GPU->CPU=  17.067 MB/sec
                                      [      1024 bytes] CPU->GPU=   8.533 MB/sec, GPU->CPU=  51.200 MB/sec
                                      [      2048 bytes] CPU->GPU= 102.400 MB/sec, GPU->CPU= 102.400 MB/sec
                                      [      4096 bytes] CPU->GPU= 204.800 MB/sec, GPU->CPU= 136.533 MB/sec
                                      [      8192 bytes] CPU->GPU= 409.600 MB/sec, GPU->CPU= 273.067 MB/sec
                                      [     16384 bytes] CPU->GPU= 136.533 MB/sec, GPU->CPU= 819.200 MB/sec
                                      [     32768 bytes] CPU->GPU=   1.638 GB/sec, GPU->CPU=   1.092 GB/sec
                                      [     65536 bytes] CPU->GPU=   2.185 GB/sec, GPU->CPU=   1.638 GB/sec
                                      [    131072 bytes] CPU->GPU=   2.621 GB/sec, GPU->CPU= 624.152 MB/sec
                                      [    262144 bytes] CPU->GPU=   3.277 GB/sec, GPU->CPU= 689.853 MB/sec
                                      [    524288 bytes] CPU->GPU=   4.369 GB/sec, GPU->CPU= 873.813 MB/sec
                                      [   1048576 bytes] CPU->GPU=   4.559 GB/sec, GPU->CPU= 832.203 MB/sec
                                      [   2097152 bytes] CPU->GPU=   4.766 GB/sec, GPU->CPU= 809.711 MB/sec
                                      [   4194304 bytes] CPU->GPU=   4.821 GB/sec, GPU->CPU= 820.803 MB/sec
                                      [   8388608 bytes] CPU->GPU=   4.906 GB/sec, GPU->CPU= 820.001 MB/sec
                                      [  16777216 bytes] CPU->GPU=   4.964 GB/sec, GPU->CPU= 821.205 MB/sec
                                      [  33554432 bytes] CPU->GPU=   4.971 GB/sec, GPU->CPU= 816.807 MB/sec
                                      [  67108864 bytes] CPU->GPU=   4.956 GB/sec, GPU->CPU= 818.600 MB/sec
                                      [ 134217728 bytes] CPU->GPU=   4.969 GB/sec, GPU->CPU= 818.850 MB/sec
                                      [ 268435456 bytes] CPU->GPU=   4.973 GB/sec, GPU->CPU= 817.578 MB/sec
                                      [ 536870912 bytes] CPU->GPU=   4.978 GB/sec, GPU->CPU= 819.038 MB/sec
                                      calResAllocLocal2D() returned an error when trying to allocate 1073741824 bytes!
                                      Peak CPU->GPU Bandwidth =   4.978 GB/sec [data size = 536870912 bytes]
                                      Peak GPU->CPU Bandwidth =   1.638 GB/sec [data size = 65536 bytes]

                                       

                                          • PCIe Performance Problem with HD5870
                                            charliex

                                            I'm getting the same thing with the 5870/5970 and firepro, asus P6x58.

                                             

                                             

                                              • PCIe Performance Problem with HD5870
                                                jhoffmann

                                                Hi charliex,

                                                to be more specific, does the problem of the pci transferrates occur on an HD 58xx-series based firepro?

                                                And if not allready done, please reported the bug to amd. Maybe the issue gets a higher priority when they see that not only their consumer line is affected...

                                                joern

                                                  • PCIe Performance Problem with HD5870
                                                    jhoffmann

                                                    Hi,

                                                    just an update: the problem isn't gone with the catalyst 10.4 driver (tested under linux)

                                                    joern

                                                      • PCIe Performance Problem with HD5870
                                                        abab

                                                        A similar CPU-GPU perf problem here:

                                                        ASUS P7P55D-E Premium; i7 860, 8GB 1333 Kingston RAM; A single Sapphire 5850 in the 1st PCIe 2.0 x16 slot (the 2nd x16 slot is unoccupied); Win7 x64 Ult.

                                                        The GPU is slightly overclocked, but the problem verified to exist at the factory clock as well. 

                                                        Tested with 10.2, 10.3 and the latest 10.4.

                                                        I am working on a GPGPU app and this little problem rains on the whole thing. I filed a service request and had a one-way lively discussion with the support ex machina.

                                                        Unless this gets resolved soon, I may have to go the Fermi route.

                                                        Alex.

                                                         

                                                        Devices found: 1

                                                         

                                                        ===> Testing device 0 <===

                                                        Device type: Unknown

                                                        Max resource 2D width/height: 16384/16384

                                                        Total GPU memory size: 1024 MB

                                                        Total CPU cached space size: 1467 MB

                                                        Total CPU uncached space size: 1467 MB

                                                        GPU engine clock: 765 MHz

                                                        GPU memory clock: 1115 MHz

                                                        Number of timing loops: 100

                                                        [        16 bytes] CPU->GPU= 101.163 KB/sec, GPU->CPU= 540.209 KB/sec

                                                        [        32 bytes] CPU->GPU=   1.117 MB/sec, GPU->CPU=   1.125 MB/sec

                                                        [        64 bytes] CPU->GPU=   2.232 MB/sec, GPU->CPU=   2.111 MB/sec

                                                        [       128 bytes] CPU->GPU=   4.314 MB/sec, GPU->CPU=   4.607 MB/sec

                                                        [       256 bytes] CPU->GPU=   2.671 MB/sec, GPU->CPU=   9.165 MB/sec

                                                        [       512 bytes] CPU->GPU=  16.947 MB/sec, GPU->CPU=  16.111 MB/sec

                                                        [      1024 bytes] CPU->GPU=  36.957 MB/sec, GPU->CPU=  36.759 MB/sec

                                                        [      2048 bytes] CPU->GPU=  73.875 MB/sec, GPU->CPU=  69.532 MB/sec

                                                        [      4096 bytes] CPU->GPU= 137.162 MB/sec, GPU->CPU= 148.082 MB/sec

                                                        [      8192 bytes] CPU->GPU= 294.147 MB/sec, GPU->CPU= 295.385 MB/sec

                                                        [     16384 bytes] CPU->GPU= 589.839 MB/sec, GPU->CPU= 487.206 MB/sec

                                                        [     32768 bytes] CPU->GPU=   1.176 GB/sec, GPU->CPU= 669.146 MB/sec

                                                        [     65536 bytes] CPU->GPU=   2.174 GB/sec, GPU->CPU= 676.652 MB/sec

                                                        [    131072 bytes] CPU->GPU=   3.016 GB/sec, GPU->CPU= 577.624 MB/sec

                                                        [    262144 bytes] CPU->GPU=   3.514 GB/sec, GPU->CPU= 540.532 MB/sec

                                                        [    524288 bytes] CPU->GPU=   3.795 GB/sec, GPU->CPU= 581.861 MB/sec

                                                        [   1048576 bytes] CPU->GPU=   3.817 GB/sec, GPU->CPU= 559.868 MB/sec

                                                        [   2097152 bytes] CPU->GPU=   4.085 GB/sec, GPU->CPU= 545.592 MB/sec

                                                        [   4194304 bytes] CPU->GPU=   4.344 GB/sec, GPU->CPU= 544.993 MB/sec

                                                        [   8388608 bytes] CPU->GPU=   4.107 GB/sec, GPU->CPU= 549.369 MB/sec

                                                        [  16777216 bytes] CPU->GPU=   4.314 GB/sec, GPU->CPU= 535.173 MB/sec

                                                        [  33554432 bytes] CPU->GPU=   4.332 GB/sec, GPU->CPU= 540.510 MB/sec

                                                        [  67108864 bytes] CPU->GPU=   4.358 GB/sec, 



                                                         

                                                         

                                                         

                                                          • PCIe Performance Problem with HD5870
                                                            jhoffmann

                                                            Hello abab,

                                                            this is intresting because you find out that not only the enthusiasts x58 is affected but also the recent Intel mainline chipsets in conjunction with ati graphic cars.

                                                            To sum up: FireStreams and Radeons on current Intel boards suffer from this issue under the driver on Linux and Windows.

                                                            My problem now is that I have 20x HD5870 cards, with the very first purpose to compute our gpgpu stuff on them. This issue renders them useless.

                                                            The next days I will speak with our hardware dealer and try to find a solution. Maybe we are forced to go the fermi way too... :-/

                                                            joern

                                                             

                                                             

                                                              • PCIe Performance Problem with HD5870
                                                                mux85

                                                                any news about this issue? I just bought an HD5850 and I am doing an OpenCL project for the university. I hope this won't be a big problem

                                                                  • PCIe Performance Problem with HD5870
                                                                    omkaranathan

                                                                    The issue is being looked into by developers, also the issue exists only for certain Intel boards.

                                                                      • PCIe Performance Problem with HD5870
                                                                        mux85

                                                                         

                                                                        Originally posted by: omkaranathan The issue is being looked into by developers, also the issue exists only for certain Intel boards.

                                                                         

                                                                        I've bought a Gigabyte GA-X58A-UD3R 1366 motherboard. will I be affected by this issue?

                                                                        thanks

                                                                          • PCIe Performance Problem with HD5870
                                                                            xero

                                                                             

                                                                            Originally posted by: mux85
                                                                            Originally posted by: omkaranathan The issue is being looked into by developers, also the issue exists only for certain Intel boards.

                                                                             

                                                                            I've bought a Gigabyte GA-X58A-UD3R 1366 motherboard. will I be affected by this issue?

                                                                            thanks

                                                                            For the intel MB with IOH chip (such as X58/P55), I guess so. : (

                                                                              • PCIe Performance Problem with HD5870
                                                                                jhoffmann

                                                                                I've read in an other forum (anandtech i guess) the P55, Q55 etc. are also been affacted. We have not testet it, but it is very likely.

                                                                                 

                                                                                 

                                                                                Originally posted by: xero
                                                                                Originally posted by: mux85
                                                                                Originally posted by: omkaranathan The issue is being looked into by developers, also the issue exists only for certain Intel boards.

                                                                                 

                                                                                 

                                                                                 

                                                                                 

                                                                                I've bought a Gigabyte GA-X58A-UD3R 1366 motherboard. will I be affected by this issue?

                                                                                 

                                                                                thanks

                                                                                 

                                                                                 

                                                                                For the intel MB with IOH chip (such as X58/P55), I guess so. : (

                                                                                 

                                                                                  • PCIe Performance Problem with HD5870
                                                                                    jhoffmann

                                                                                    Hi,

                                                                                    there is no improvement with the Catalyst driver 10.5 or 10.4. I can't   measure a difference with PCISpeedTest 0.2. Hopefully the GPGPU related show stopper is still in scope.

                                                                                    joern

                                                                                      • PCIe Performance Problem with HD5870
                                                                                        abab

                                                                                        Well, in my case ATI's advice on being patient has reached its time limit. I am no stranger and can appreciate  technical difficulties in software/hardware development  - if ati/amd would just provide some visibility into the problem and its resolution effort.

                                                                                        Sans that,  I can't wait forever - so I am off to Fermi lands for now.

                                                                                        Will be back - maybe.

                                                                                        Alex.

                                                                                         

                                                                                          • PCIe Performance Problem with HD5870
                                                                                            Tzupy

                                                                                            Still no improvement for me (i7-920, 4850 1GB, Vista64) with 10.6 drivers.

                                                                                            But an OpenGL bug I reported some time ago has been fixed...

                                                                                              • PCIe Performance Problem with HD5870
                                                                                                jhoffmann

                                                                                                Hi Tzupy and all others,

                                                                                                there is also no improvement for me with 10.6.

                                                                                                These days were are in contact with Intel and ASUS.

                                                                                                Intel says there is a speed adjustment problem with the card related to that it has PCI Express 2.1 and the board only supports PCIe 2.0. Hmmm. As far as I know all Intel boards in special and all other mainboars in general only support PCIe 2.0. But the cause itself sounds not bad. From the beginning we guess that there is a handshake problem between the host- and device-interfaces because the transfer speeds semms to be limited to 4x or something.

                                                                                                Asus (Germany) on the other side confirmed the problem. The have tested the P6T SE with the PCIExpressSpeedTest form ATI and got the same results. In addition they tried a game benchmark and supposingly :-> don't find any flaws. They ask us to send them a "real" world application wich suffers from the issue. I'll send them one the next days.

                                                                                                The game benchmark example Asus mentioned is the very reason why this flaw isn't in scope of the ATI developers. It just plays no role for their major customers - the gamers and board vendors. But they also promote their cards as "GPGPUs" and should act as professionals.

                                                                                                As an example how other vendors care about their customers let me talk about a recent event. I've spottet a heavy bug in the NVIDIA OpenCL compiler (see code). They have a website related only for professional developers. There you can file a bug. Three hours after I've done this an employee ask me for an demo-program, instructions to start it and the output of a bug-report script. One work day later the flaw was fixed and the bugfix was added to compiler. This was today. The compiler will be released in the next driver release the next days.

                                                                                                Regards,

                                                                                                Jörn

                                                                                                 

                                                                                                 

                                                                                                OpenCL kernel code: char c = -1; float f; double d; f = c; d = c; // result: "f" or "d" is not "-1" but "255" // also wrong: // f = (signed) c; // f = (signed char) c // f = (int) c; // f = (signed int) c ...

                                                                                                  • PCIe Performance Problem with HD5870
                                                                                                    jhoffmann

                                                                                                    Hi all,

                                                                                                    there are good news for all AMD/ATI customers using their 5xxx-cards on an x58 board for GPGPU under Linux. The new Catalyst 10.7 fixes the PCIe performance issue. With this driver we measure the maximum possible (real-life) interface bandwidth in both directions:

                                                                                                    [        16 bytes] CPU->GPU= 320.000 KB/sec, GPU->CPU= 200.000 KB/sec
                                                                                                    [        32 bytes] CPU->GPU= 640.000 KB/sec, GPU->CPU= 640.000 KB/sec
                                                                                                    [        64 bytes] CPU->GPU=   1.280 MB/sec, GPU->CPU=   2.133 MB/sec
                                                                                                    [       128 bytes] CPU->GPU= 800.000 KB/sec, GPU->CPU=   2.560 MB/sec
                                                                                                    [       256 bytes] CPU->GPU=   6.400 MB/sec, GPU->CPU=   3.200 MB/sec
                                                                                                    [       512 bytes] CPU->GPU=  17.067 MB/sec, GPU->CPU=  25.600 MB/sec
                                                                                                    [      1024 bytes] CPU->GPU=  34.133 MB/sec, GPU->CPU=  51.200 MB/sec
                                                                                                    [      2048 bytes] CPU->GPU=  68.267 MB/sec, GPU->CPU=  68.267 MB/sec
                                                                                                    [      4096 bytes] CPU->GPU= 136.533 MB/sec, GPU->CPU= 204.800 MB/sec
                                                                                                    [      8192 bytes] CPU->GPU=  91.022 MB/sec, GPU->CPU= 409.600 MB/sec
                                                                                                    [     16384 bytes] CPU->GPU= 546.133 MB/sec, GPU->CPU= 819.200 MB/sec
                                                                                                    [     32768 bytes] CPU->GPU=   1.638 GB/sec, GPU->CPU=   1.092 GB/sec
                                                                                                    [     65536 bytes] CPU->GPU=   2.185 GB/sec, GPU->CPU=   2.185 GB/sec
                                                                                                    [    131072 bytes] CPU->GPU=   1.192 GB/sec, GPU->CPU=   3.277 GB/sec
                                                                                                    [    262144 bytes] CPU->GPU=   5.243 GB/sec, GPU->CPU=   5.243 GB/sec
                                                                                                    [    524288 bytes] CPU->GPU=   5.825 GB/sec, GPU->CPU=   4.766 GB/sec
                                                                                                    [   1048576 bytes] CPU->GPU=   6.168 GB/sec, GPU->CPU=   4.993 GB/sec
                                                                                                    [   2097152 bytes] CPU->GPU=   5.992 GB/sec, GPU->CPU=   6.554 GB/sec
                                                                                                    [   4194304 bytes] CPU->GPU=   6.079 GB/sec, GPU->CPU=   6.658 GB/sec
                                                                                                    [   8388608 bytes] CPU->GPU=   6.123 GB/sec, GPU->CPU=   6.658 GB/sec
                                                                                                    [  16777216 bytes] CPU->GPU=   6.123 GB/sec, GPU->CPU=   6.684 GB/sec
                                                                                                    [  33554432 bytes] CPU->GPU=   5.483 GB/sec, GPU->CPU=   6.711 GB/sec
                                                                                                    [  67108864 bytes] CPU->GPU=   5.297 GB/sec, GPU->CPU=   6.704 GB/sec
                                                                                                    [ 134217728 bytes] CPU->GPU=   5.320 GB/sec, GPU->CPU=   6.704 GB/sec
                                                                                                    [ 268435456 bytes] CPU->GPU=   5.313 GB/sec, GPU->CPU=   6.706 GB/sec
                                                                                                    [ 536870912 bytes] CPU->GPU=   5.092 GB/sec, GPU->CPU=   6.687 GB/sec
                                                                                                    calResAllocLocal2D() returned an error when trying to allocate 1073741824 bytes!
                                                                                                    Peak CPU->GPU Bandwidth =   6.168 GB/sec [data size = 1048576 bytes]
                                                                                                    Peak GPU->CPU Bandwidth =   6.711 GB/sec [data size = 33554432 bytes]

                                                                                                     

                                                                                                    Thank you AMD/ATI developers.

                                                                                                    Joern

                                                                                                     

                                                                • reply
                                                                  mensjeans

                                                                  I tried to install a HD4870 on the P55 mainboard. The result is as slow as the 5870.I also intalled the 5870 on a P45 mainboard. The CPU->GPU/GPU->CPU speed can reach ~5GB/s and thank you very much for sharing!