cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ATI Stream Power Toy - PCIeSpeedTest

We've posted an "ATI Stream Power Toy" page:

http://developer.amd.com/GPU/ATISTREAMPOWERTOY/Pages/default.aspx

The first such "toy" on this page is a PCIeSpeedTest utility that tried to measure the PCIe bandwidth on your system. We've found this useful when trying to see what the PCIe IO bandwidth is in a system and how it might affect the overall performance.

This is a synthetic benchmark so your actualy real world performance may vary. But, its intent is to try to extract as much bandwidth out of PCIe link in each direction as possible.

This is offered as an unsupported, but hopefully semi-useful tool for all of you. 🙂 Suggestions for other "toys" you'd like to see can be sent to streamdeveloper@amd.com. No promises, but we'll see what we can do.

If you run this and want to share your peak results, we'd be interested in seeing it on this thread! Post your peak results in each direction along with your motherboard, CPU and GPU configurations.

Michael.

0 Likes
37 Replies
rahulgarg
Adept II

On Ubuntu 8.10 64-bit, it gave some compilation errors.

Adding #include <cstdlib> in PCIeSpeedTest_random.cpp fixes this.

After fixing and running ./PCIeSpeedTest  on a phenom 9550+780v chipset + Radeon 4870 + 1066 MHz ddr2 I get:

 

Peak CPU->GPU Bandwidth =   5.160 GB/sec [data size = 134217728 bytes]
Peak GPU->CPU Bandwidth =   4.415 GB/sec [data size = 4194304 bytes]

0 Likes

Great little tool!

I have a problem though. It locks up my computer and crashes my display driver when the test reaches 67108864 bytes. I'm not too concerned since I'm running Windows 7 x64 beta build 7000 with the WDDM 1.1 beta display driver but I would just like to relay the information.

My hardware is Phenom X4 9550, Radeon HD 4850, Gigabyte GA-MA790X-UD4, 4x1GB DDR2-800 RAM.

Similar to the PCIe bus would it be necessary to make a tool for the HyperTransport bus?

0 Likes

On Ubuntu 8.10 64-bit, ASUS P6T Deluxe, Intel i7 920, Corsair XMS3 DDR3 1600, ATI Radeon HD 4850 (512 MB)

Peak CPU->GPU Bandwidth =   4.978 GB/sec [datasize = 134217728 bytes]

Peak GPU->CPU Bandwidth =  2.185 GB/sec [datasize = 65536 bytes]

The GPU->CPU Bandwidth is a little disappointing and it gets worse as the data size increases, gradually decreasing to 1.27 GB/sec at 134217728 bytes.

Great little tool. Thanks.

0 Likes

To add my own data here... 🙂

On my desk, I've got an MSI K9A2 Platinum (790FX chipset), with an AMD Phenom 9850 quad core with 4GB of memory and a FireStream 9250 plugged in. Running with an early version of Catalyst 9.3 on Linux SLES 10 SP2, I'm seeing:

Peak CPU->GPU Bandwidth = 5.532 GB/sec [data size = 536870912 bytes]
Peak GPU->CPU Bandwidth = 5.992 GB/sec [data size = 8388608 bytes]

This is one of those motherboards where if you plug cards into slots 1 and 3, you get full x16 gen2 performance but if you plug cards into 2 and 4 as well, it drops to x8 gen2.

I don't have the time to run it right now, but I've seen it behave well with 2 and 4 cards in there as well (with 4 cards, it scales down pretty well in my past experiments).

Michael.

0 Likes

Hi michael.chu@amd.com,


Just for my curiosity, which kind of cards did you test on your MSI K9A2 platinum PCIe x8 ? Firestreams or maybe desktop radeon? 4870X2 ?


Thanks in advance

0 Likes

Hi igimenez,

If I remember correctly, I've tried it with both FireStreams and Radeons. It was mainly whatever was laying around my desk or in the lab that was convenient. 🙂

0 Likes

Hi,

can somebody explain my results?

Peak CPU->GPU Bandwidth =   5.047 GB/sec [data size = 4194304 bytes]
Peak GPU->CPU Bandwidth = 776.359 MB/sec [data size = 4194304 bytes]

I've tried with older drivers with same result.

It's a MSI K9A2 Platinum, AMD Phenom II 940 BE, 4GB, Sapphire HD4870 512MB, Ati Stream 1.4, Catalyst 9.7.

0 Likes

My test in XP 32, 4GB RAM, X38, Q6600, 2x4850 512MB

When CF off, CPU to GPU 3.3GBps and GPU to CPU 3.99GBps at big data size

When CF on, CPU to GPU 3.3GBps and GPU to CPU 6.5GBps for GPU 0 and 6.7GBps for GPU 1

Why differ so much with others? I don't think Intel chipset this weird

0 Likes

Hi jross,

Our FireStream product manager and I have done some experiments on various systems as well as with some OEMs (running this exact PCIeSpeedTest) and we have seen similar peaks and dropoffs on Intel systems. Not at all sure why it does that whereas with the 790s it doesn't do that.

May be an artifact of the test, but I can't imagine where... if I had the time, or if anyone here has the time, I wanted to try the same test using user pinned memory in CAL.

Michael.

0 Likes

It is interesting to see these numbers so close to what the theory predicts. PCI-E 2.0 x16 gives 8GB/s of max theoretical bandwidth per direction. However the most important factor determining the practical bandwith is the Max_Payload_Size setting (between 128 bytes and 4096 bytes) negotiated between endpoints (cards) and root ports (on motherboard chipsets). 128 bytes, the default value, allow using 60% of the max theoretical bw, 256 bytes 70%, ... and 4096 bytes almost 100%.

It is very common for PCI-E cards to support a Max_Payload_Size of 256, 512, or 1024 bytes. Unfortunately even as of 2009 the vast majority of chipsets only support 128 or 256 bytes.

This explains why most of you measure a practical usable bandwidth with that tool of roughly between 8 GB/s * 60% = 4.8 GB/s and 8 GB/s * 70% = 5.6 GB/s.

If you want to see the Max_Payload_Size value on your system, under Linux/BSD/Solaris, run "lspci -vv".

0 Likes

Pinned memory: Modified the file and replaced calResAllocRemote2D call with a custom call to calResCreate2D.

calResCreate2D returned an error when trying to allocate 16777216 bytes !
Peak CPU->GPU Bandwidth =   4.877 GB/sec [data size = 2097152 bytes]
Peak GPU->CPU Bandwidth =   3.957 GB/sec [data size = 2097152 bytes]

0 Likes

Hi Firestrider, yeah, I've noticed that my system will kind of get a bit sluggish on the larger transfers. Not quite sure why (haven't had a chance to investigate it yet).

In this benchmark, I essentially stack about 100 calMemCopy() requests from/to uncached CAL memory resources on the CPU side to GPU memory resources. I wait for the very last CALevent to be done.

For the HT test, not sure if you can control things at the application level close enough to measure that performance. In my "prior life" at an in-socket FPGA accelerator company, it was much easier to do that test because we had specific calls that were sending and receiving data across the HT bus to the accelerator.

0 Likes
rahulgarg
Adept II

I modified the test slightly to test for CPU Cacheable resources. Instead of passing flag 0 in ResAllocRemote2D, I passed flag CAL_RESALLOC_CACHEABLE. On my system, cacheable remote RAM is restricted to 60MB and here are the results for peak:

Peak CPU->GPU Bandwidth =   5.069 GB/sec [data size = 16777216 bytes]
Peak GPU->CPU Bandwidth =   3.745 GB/sec [data size = 1048576 bytes]

(System details posted in earlier reply)

0 Likes
FangQ
Adept I

My home computer wasn't built for stream computing, I only use it to learn stream programming and code prototyping, so the outputs are not as impressive as others.

My MOBO is EVGA GeForce7050(610i), with Intel Q6700 quad core+3G DDR2 memory+Radeon HD 4650 (512M), the PCIeSpeedTest output is below:

calResAllocLocal2D() returned an error when trying to allocate 268435456 bytes!
Peak CPU->GPU Bandwidth =   2.620 GB/sec [data size = 67108864 bytes]
Peak GPU->CPU Bandwidth =   3.160 GB/sec [data size = 33554432 bytes]

0 Likes

I am more interested in smaller data size (real life communications).

Scientific Linux 5.2 64b, Phenom 9550, 8GB DDR2-800, 790X chipset, MSI K9A2-CF motherboard, single HD4870/ 1GB

PCIeSpeedTest
Devices found: 1

===> Testing device 0 <===
Device type: RV770
Max resource 2D width/height: 8192/8192
Total GPU memory size: 1024 MB
Total CPU cached space size: 60 MB
Total CPU uncached space size: 1984 MB
GPU engine clock: 0 MHz
GPU memory clock: 0 MHz
Number of timing loops: 100
[        16 bytes] CPU->GPU= 533.333 KB/sec, GPU->CPU 400.000 KB/sec
[        32 bytes] CPU->GPU= 800.000 KB/sec, GPU->CPU   1.067 MB/sec
[        64 bytes] CPU->GPU=   1.067 MB/sec, GPU->CPU   2.133 MB/sec
[       128 bytes] CPU->GPU=   3.200 MB/sec, GPU->CPU   4.267 MB/sec
[       256 bytes] CPU->GPU=   8.533 MB/sec, GPU->CPU   6.400 MB/sec
[       512 bytes] CPU->GPU=  12.800 MB/sec, GPU->CPU  17.067 MB/sec
[      1024 bytes] CPU->GPU=  14.629 MB/sec, GPU->CPU  34.133 MB/sec
[      2048 bytes] CPU->GPU=  51.200 MB/sec, GPU->CPU  68.267 MB/sec
[      4096 bytes] CPU->GPU=  51.200 MB/sec, GPU->CPU 102.400 MB/sec
[      8192 bytes] CPU->GPU= 273.067 MB/sec, GPU->CPU 273.067 MB/sec
[     16384 bytes] CPU->GPU= 273.067 MB/sec, GPU->CPU 273.067 MB/sec
[     32768 bytes] CPU->GPU= 819.200 MB/sec, GPU->CPU   1.092 GB/sec
[     65536 bytes] CPU->GPU= 936.229 MB/sec, GPU->CPU   2.185 GB/sec
[    131072 bytes] CPU->GPU=   3.277 GB/sec, GPU->CPU   1.456 GB/sec
[    262144 bytes] CPU->GPU=   4.369 GB/sec, GPU->CPU   3.277 GB/sec
[    524288 bytes] CPU->GPU=   4.766 GB/sec, GPU->CPU   3.495 GB/sec
[   1048576 bytes] CPU->GPU=   4.993 GB/sec, GPU->CPU   3.383 GB/sec
[   2097152 bytes] CPU->GPU=   4.993 GB/sec, GPU->CPU   3.495 GB/sec
[   4194304 bytes] CPU->GPU=   5.115 GB/sec, GPU->CPU   3.525 GB/sec
[   8388608 bytes] CPU->GPU=   5.146 GB/sec, GPU->CPU   3.539 GB/sec
[  16777216 bytes] CPU->GPU=   5.162 GB/sec, GPU->CPU   3.466 GB/sec
[  33554432 bytes] CPU->GPU=   5.178 GB/sec, GPU->CPU   3.477 GB/sec
[  67108864 bytes] CPU->GPU=   5.186 GB/sec, GPU->CPU   3.470 GB/sec
[ 134217728 bytes] CPU->GPU=   5.188 GB/sec, GPU->CPU   3.457 GB/sec
[ 268435456 bytes] CPU->GPU=   5.191 GB/sec, GPU->CPU   3.458 GB/sec
[ 536870912 bytes] CPU->GPU=   5.191 GB/sec, GPU->CPU   3.459 GB/sec
calResAllocLocal2D() returned an error when trying to allocate 1073741824 bytes!
Peak CPU->GPU Bandwidth =   5.191 GB/sec [data size = 268435456 bytes]
Peak GPU->CPU Bandwidth =   3.539 GB/sec [data size = 8388608 bytes]

0 Likes

The test won't complete, I get a "driver stopped responding" issue. I'm running Vista x64, couldn't find a VPU recover in CCC. Any ideas?

 

0 Likes

Originally posted by: ryta1203 The test won't complete, I get a "driver stopped responding" issue. I'm running Vista x64, couldn't find a VPU recover in CCC. Any ideas?

 

Yeah, a lot of people are having display driver crashes in Vista/7 but from the download page it looks like only Linux 64-bit and XP 32-bit are officially supported.

0 Likes

Originally posted by: Firestrider
Originally posted by: ryta1203 The test won't complete, I get a "driver stopped responding" issue. I'm running Vista x64, couldn't find a VPU recover in CCC. Any ideas?

 

Yeah, a lot of people are having display driver crashes in Vista/7 but from the download page it looks like only Linux 64-bit and XP 32-bit are officially supported.

You know silly me didn't even look at the OSes supported. Makes sense now... kinda.... although I'm unclear how useful a tool this is for Windows users, since most of us are using Vista/7.

Re-looking at the FAQ I noticed where it mentioned the fix, the TdrLevel registry entry. I added it and the test completed.

My results were ~4.5GB/s CPU->GPU and ~4.8GB/s GPU->CPU for the 1st GPU.

My results were ~4GB/s CPU->GPU and ~5.1GB/s GPU->CPU for the 2nd GPU.

I have almost the same setup as Michael:

MSI K9A2 Plat

Phenom 9850, 2.7GHz

4GB 1066 DDR2 OCZ Plat

Two 4850's in CFX with 512MB each.

Vista Business x64.

0 Likes

On Debian 5.0 64-bit, ASUS P6T, Intel i7 920, 12GB DDR3 1333, two HD4870X2

Peak CPU->GPU Bandwidth =   4.793 GB/sec [data size = 16777216 bytes]
Peak GPU->CPU Bandwidth =   2.185 GB/sec [data size = 65536 bytes]

(same results for the 4 vpus)

The CPU->GPU bandwidth is what the theory predicts (hwinfo --pci reports 128 byte max payload)
However the GPU->CPU bandwith is quite low, and it gets worse as data size increases as jross reported on similar hardware (but with a single vpu).

Nice tool, thanks.

0 Likes

on XP x64, Phenom 9500 2.2GHz, 790FX Chipset, 8GB DDR2-667 ECC, Gigabyte MA790FX-DQ6, Sapphire HD4887-X2/2GB, I get identical results on both device 0 & device 1 of the GPU as follows (I am surprised with the clock reading of the GPU?):

Devices found: 2

===> Testing device 0 <===
Device type: RV770
Max resource 2D width/height: 8192/8192
Total GPU memory size: 1024 MB
Total CPU cached space size: 64 MB
Total CPU uncached space size: 2048 MB
GPU engine clock: 507 MHz
GPU memory clock: 500 MHz
Number of timing loops: 100
[        16 bytes] CPU->GPU= 586.570 KB/sec, GPU->CPU 680.845 KB/sec
[        32 bytes] CPU->GPU=   1.339 MB/sec, GPU->CPU   1.368 MB/sec
[        64 bytes] CPU->GPU=   2.728 MB/sec, GPU->CPU   2.451 MB/sec
[       128 bytes] CPU->GPU=   4.921 MB/sec, GPU->CPU   4.933 MB/sec
[       256 bytes] CPU->GPU=   9.726 MB/sec, GPU->CPU   9.714 MB/sec
[       512 bytes] CPU->GPU=  19.165 MB/sec, GPU->CPU  19.597 MB/sec
[      1024 bytes] CPU->GPU=  38.903 MB/sec, GPU->CPU  38.891 MB/sec
[      2048 bytes] CPU->GPU=  77.005 MB/sec, GPU->CPU  78.088 MB/sec
[      4096 bytes] CPU->GPU= 153.511 MB/sec, GPU->CPU 156.293 MB/sec
[      8192 bytes] CPU->GPU= 309.386 MB/sec, GPU->CPU 314.058 MB/sec
[     16384 bytes] CPU->GPU= 582.049 MB/sec, GPU->CPU 605.172 MB/sec
[     32768 bytes] CPU->GPU=   1.201 GB/sec, GPU->CPU   1.236 GB/sec
[     65536 bytes] CPU->GPU=   2.269 GB/sec, GPU->CPU   2.360 GB/sec
[    131072 bytes] CPU->GPU=   3.058 GB/sec, GPU->CPU   3.620 GB/sec
[    262144 bytes] CPU->GPU=   3.853 GB/sec, GPU->CPU   3.975 GB/sec
[    524288 bytes] CPU->GPU=   4.473 GB/sec, GPU->CPU   4.501 GB/sec
[   1048576 bytes] CPU->GPU=   4.772 GB/sec, GPU->CPU   4.788 GB/sec
[   2097152 bytes] CPU->GPU=   4.992 GB/sec, GPU->CPU   5.121 GB/sec
[   4194304 bytes] CPU->GPU=   5.103 GB/sec, GPU->CPU   5.345 GB/sec
[   8388608 bytes] CPU->GPU=   5.159 GB/sec, GPU->CPU   5.481 GB/sec
[  16777216 bytes] CPU->GPU=   5.191 GB/sec, GPU->CPU   4.849 GB/sec
[  33554432 bytes] CPU->GPU=   5.197 GB/sec, GPU->CPU   4.843 GB/sec
[  67108864 bytes] CPU->GPU=   5.202 GB/sec, GPU->CPU   4.862 GB/sec
[ 134217728 bytes] CPU->GPU=   5.208 GB/sec, GPU->CPU   4.817 GB/sec
[ 268435456 bytes] CPU->GPU=   5.209 GB/sec, GPU->CPU   4.816 GB/sec
[ 536870912 bytes] CPU->GPU=   5.208 GB/sec, GPU->CPU   4.825 GB/sec
Peak CPU->GPU Bandwidth =   5.209 GB/sec [data size = 268435456 bytes]
Peak GPU->CPU Bandwidth =   5.481 GB/sec [data size = 8388608 bytes]

===> Testing device 1 <===
Device type: RV770
Max resource 2D width/height: 8192/8192
Total GPU memory size: 1024 MB
Total CPU cached space size: 64 MB
Total CPU uncached space size: 2048 MB
GPU engine clock: 507 MHz
GPU memory clock: 500 MHz
.....the same results

0 Likes

Originally posted by: ryta1203

You know silly me didn't even look at the OSes supported. Makes sense now... kinda.... although I'm unclear how useful a tool this is for Windows users, since most of us are using Vista/7.

Re-looking at the FAQ I noticed where it mentioned the fix, the TdrLevel registry entry. I added it and the test completed.

Which FAQ are you referring to ryta? I could not find any FAQ for PCIeSpeedTest?

0 Likes

Originally posted by: attilagenc
Originally posted by: ryta1203

You know silly me didn't even look at the OSes supported. Makes sense now... kinda.... although I'm unclear how useful a tool this is for Windows users, since most of us are using Vista/7.

Re-looking at the FAQ I noticed where it mentioned the fix, the TdrLevel registry entry. I added it and the test completed.

Which FAQ are you referring to ryta? I could not find any FAQ for PCIeSpeedTest?

CAL FAQ that comes with documentation.

0 Likes

Not sure if this is any good, but it seems like on my 4870X2, GPU0 GPU->CPU is bottlenecked somehow, and GPU1 GPU->CPU suddenly drops after 8388608 

0 Likes
the729
Journeyman III

Hi everyone,

It seems there are some problem with my box. The test result is:

Peak CPU->GPU Bandwidth =   2.793 GB/sec [data size = 536870912 bytes]
Peak GPU->CPU Bandwidth =   2.994 GB/sec [data size = 536870912 bytes]

My hardware is:

Gigabyte 790X-DS4, Phenom 9550, DDR2 800 2G*2, Sapphire HD 4870 1G

I am running Kubuntu 8.10 64bit. The following is lspci -vv output, which I think is problematic, since it says "[58] Express (v2) Legacy Endpoint" and "LnkSta:    Speed 2.5GT/s,".

Is there something wrong with the software or hardware?
Could you using ubuntu x64 please post the lspci -vv output?

=================================================

01:00.0 VGA compatible controller: ATI Technologies Inc RV770 [Radeon HD 4870]
    Subsystem: PC Partner Limited Device e850
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 4 bytes
    Interrupt: pin A routed to IRQ 2300
    Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
    Region 2: Memory at fdee0000 (64-bit, non-prefetchable) [size=64K]
    Region 4: I/O ports at de00 [size=256]
    [virtual] Expansion ROM at fde00000 [disabled] [size=128K]
    Capabilities: [50] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
        DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta:    CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
            ClockPM- Suprise- LLActRep- BwNot-
        LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
    Capabilities: [a0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
        Address: 00000000fee0f00c  Data: 4191
    Capabilities: [100] Vendor Specific Information <?>
    Kernel driver in use: fglrx_pci
    Kernel modules: fglrx

0 Likes

On Debian 5.0 64-bit, ASUS P6T, Intel i7 920, 12GB DDR3 1333, two HD4870X2

Peak CPU->GPU Bandwidth =   4.793 GB/sec [data size = 16777216 bytes]
Peak GPU->CPU Bandwidth =   2.185 GB/sec [data size = 65536 bytes]

(same results for the 4 vpus)

The CPU->GPU bandwidth is what the theory predicts (hwinfo --pci reports 128 byte max payload)
However the GPU->CPU bandwith is quite low, and it gets worse as data size increases as jross reported on similar hardware (but with a single vpu).

Nice tool, thanks.

 

0 Likes

Originally posted by: the729

I am running Kubuntu 8.10 64bit. The following is lspci -vv output, which I think is problematic, since it says "[58] Express (v2) Legacy Endpoint" and "LnkSta:    Speed 2.5GT/s,".

 

Your mobo supports PCI-E 1.0 only (2.5GT/s). So assuming a Max_Payload_Size of 256 bytes (I guess) you should see 4.0 GB/s * 70% = 2.8GB/s of throughput... which is exactly what you measure.

0 Likes

Originally posted by: zpdixon

 

Your mobo supports PCI-E 1.0 only (2.5GT/s).

 

The mobo is GA-790X-DS4. It supports PCIe 2.0, according to the page :

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2695

Does that mean this is something to do with the driver or software?

Is there a quick way to check those PCIe parameters in Windows (not by running this power toy)?

0 Likes

Originally posted by: the729

 

The mobo is GA-790X-DS4. It supports PCIe 2.0, according to the page :

 

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2695

 

Does that mean this is something to do with the driver or software?

 

Is there a quick way to check those PCIe parameters in Windows (not by running this power toy)?

 

Well it may support 5.0GT/s but it negotiated only 2.5GT/s with your Sapphire 4870. It could be because your sapphire only supports 2.5GT/s (not sure...) or because of EMI interferences or random hardware problems, etc.

0 Likes

Hi,

I have interesting results:

PCIeSpeedTest

2.5GiB/s CPU->GPU

785MiB/s GPU->CPU

The GPU is on par with CPU until 16384 bytes are copied, after this point CPU gets a big boost. When it reaches the 256 MiB, it crashes or has a huge performance drop (437MiB/s CPU->GPU, 668 MiB/s GPU->CPU).

System: Intel Q6600@2.4GHz, 8GiB, HD4870 512MiB, Vista Business 64bit, Catalyst 9.3, ATi Stream 1.4.

0 Likes

Hi,

Thanks for the tool!

It caused VPU recover to kick in..most of the time..on my machine, but when it ran the whole way through  I got peaks of:

5.54GB/sec @ 2097152 bytes CPU->GPU

1.09GB/sec @ 134217728 bytes GPU->CPU

Both directions are similar performance till they both reach ~1GB/sec then GPU->CPU stops increasing.

System: HD4890 1GB, Phenom II 720 @ 2.8, GA-MA790XT-UD4P, 4GB DDR3, Catalyst 9.4, XP SP3 32bit

0 Likes

I have a problem though. It locks up my computer and crashes my display driver when the test reaches 67108864 bytes. I'm not too concerned since I'm running Windows 7 x64 beta build 7000 with the WDDM 1.1 beta display driver but I would just like to relay the information.

0 Likes

The GPU is on par with CPU until 16384 bytes are copied, after this point CPU gets a big boost. When it reaches the 256 MiB, it crashes or has a huge performance drop (437MiB/s CPU->GPU, 668 MiB/s GPU->CPU).

0 Likes

Now have tested using Win Server 2008r2 64-bit,  GPU->CPU has improved from Win XP: 

XP SP3 32-bit:

5.54GB/sec @ 2097152 bytes CPU->GPU

1.09GB/sec @ 134217728 bytes GPU->CPU

 

2008r2 64-bit:

5.33GB/sec @ 268435456 bytes CPU->GPU

5.52BG/sec @ 8388608 bytes GPU->CPU

 

Same system used in both:

 HD4890 1GB, Phenom II 720 @ 2.8, GA-MA790XT-UD4P, 4GB DDR3



0 Likes
wuttz
Journeyman III

http://i664.photobucket.com/albums/vv4/wuttzi/CCIMG700.png

 

mine crashes.

 

system specs

phenom ii 955be, gigabyte ma785gmt-ud2h, ocz3 platinum 2x2gb ddr3-1333 cas6, ati radeon 4870x2, wd caviar green 500gb/32mb, pc power & cooling 750w, win7x64 professional

0 Likes

Hi all

Win xp sp3

Motherboard gigabyte ga-ma790fxt-ud5p 

Processor amd phenom II x3 720BE@3712Mhz 

4GB RAM ddr3 1666

 

===> Testing device 0 <===

Device type: RV730

Max resource 2D width/height: 8192/8192

Total GPU memory size: 512 MB

Total CPU cached space size: 64 MB

Total CPU uncached space size: 512 MB

GPU engine clock: 800 MHz

GPU memory clock: 1125 MHz

Number of timing loops: 100

[        16 bytes] CPU->GPU= 697.646 KB/sec, GPU->CPU= 715.282 KB/sec

[        32 bytes] CPU->GPU=   1.434 MB/sec, GPU->CPU=   2.003 MB/sec

[        64 bytes] CPU->GPU=   2.300 MB/sec, GPU->CPU=   2.352 MB/sec

[       128 bytes] CPU->GPU=   4.604 MB/sec, GPU->CPU=   4.614 MB/sec

[       256 bytes] CPU->GPU=   9.126 MB/sec, GPU->CPU=   8.963 MB/sec

[       512 bytes] CPU->GPU=  17.729 MB/sec, GPU->CPU=  18.378 MB/sec

[      1024 bytes] CPU->GPU=  35.719 MB/sec, GPU->CPU=  37.482 MB/sec

[      2048 bytes] CPU->GPU=  70.117 MB/sec, GPU->CPU=  71.767 MB/sec

[      4096 bytes] CPU->GPU= 144.484 MB/sec, GPU->CPU= 141.988 MB/sec

[      8192 bytes] CPU->GPU= 279.116 MB/sec, GPU->CPU= 266.716 MB/sec

[     16384 bytes] CPU->GPU= 583.427 MB/sec, GPU->CPU= 544.476 MB/sec

[     32768 bytes] CPU->GPU=   1.009 GB/sec, GPU->CPU=   1.206 GB/sec

[     65536 bytes] CPU->GPU=   1.154 GB/sec, GPU->CPU=   1.345 GB/sec

[    131072 bytes] CPU->GPU=   1.269 GB/sec, GPU->CPU=   1.457 GB/sec

[    262144 bytes] CPU->GPU=   1.324 GB/sec, GPU->CPU=   1.509 GB/sec

[    524288 bytes] CPU->GPU=   1.352 GB/sec, GPU->CPU=   1.544 GB/sec

[   1048576 bytes] CPU->GPU=   1.367 GB/sec, GPU->CPU=   1.556 GB/sec

[   2097152 bytes] CPU->GPU=   1.375 GB/sec, GPU->CPU=   1.546 GB/sec

[   4194304 bytes] CPU->GPU=   1.378 GB/sec, GPU->CPU=   1.530 GB/sec

[   8388608 bytes] CPU->GPU=   1.380 GB/sec, GPU->CPU=   1.525 GB/sec

[  16777216 bytes] CPU->GPU=   1.381 GB/sec, GPU->CPU=   1.519 GB/sec

[  33554432 bytes] CPU->GPU=   1.382 GB/sec, GPU->CPU=   1.531 GB/sec

[  67108864 bytes] CPU->GPU=   1.382 GB/sec, GPU->CPU=   1.528 GB/sec

[ 134217728 bytes] CPU->GPU=   1.382 GB/sec, GPU->CPU=   1.528 GB/sec

[ 268435456 bytes] CPU->GPU=   1.382 GB/sec, GPU->CPU=   1.526 GB/sec

calResAllocLocal2D() returned an error when trying to allocate 536870912 bytes!

calResAllocRemote2D() returned an error when trying to allocate 536870912 bytes

(uncached)!

Peak CPU->GPU Bandwidth =   1.382 GB/sec [data size = 268435456 bytes]

Peak GPU->CPU Bandwidth =   1.556 GB/sec [data size = 1048576 bytes]



0 Likes

Hi,

i've also performance problems with an ATI HD5870, Core i7 950, Asus P6T SE (x58) and 12gb RAM:

===> Testing device 0 <===
Device type: Unknown
Max resource 2D width/height: 16384/16384
Total GPU memory size: 1024 MB
Total CPU cached space size: 508 MB
Total CPU uncached space size: 1279 MB
GPU engine clock: 900 MHz
GPU memory clock: 1300 MHz
Number of timing loops: 100
[        16 bytes] CPU->GPU= 800.000 KB/sec, GPU->CPU=   1.600 MB/sec
[        32 bytes] CPU->GPU= 457.143 KB/sec, GPU->CPU=   3.200 MB/sec
[        64 bytes] CPU->GPU=   1.600 MB/sec, GPU->CPU=   2.133 MB/sec
[       128 bytes] CPU->GPU=   3.200 MB/sec, GPU->CPU=   4.267 MB/sec
[       256 bytes] CPU->GPU=   8.533 MB/sec, GPU->CPU=   8.533 MB/sec
[       512 bytes] CPU->GPU=  12.800 MB/sec, GPU->CPU=   7.314 MB/sec
[      1024 bytes] CPU->GPU=  34.133 MB/sec, GPU->CPU=  34.133 MB/sec
[      2048 bytes] CPU->GPU=  68.267 MB/sec, GPU->CPU=  68.267 MB/sec
[      4096 bytes] CPU->GPU= 136.533 MB/sec, GPU->CPU= 204.800 MB/sec
[      8192 bytes] CPU->GPU= 273.067 MB/sec, GPU->CPU= 273.067 MB/sec
[     16384 bytes] CPU->GPU= 546.133 MB/sec, GPU->CPU= 546.133 MB/sec
[     32768 bytes] CPU->GPU=   1.092 GB/sec, GPU->CPU= 655.360 MB/sec
[     65536 bytes] CPU->GPU=   2.185 GB/sec, GPU->CPU= 595.782 MB/sec
[    131072 bytes] CPU->GPU=   3.277 GB/sec, GPU->CPU= 504.123 MB/sec
[    262144 bytes] CPU->GPU=   3.745 GB/sec, GPU->CPU= 468.114 MB/sec
[    524288 bytes] CPU->GPU=   4.033 GB/sec, GPU->CPU= 468.114 MB/sec
[   1048576 bytes] CPU->GPU=   4.194 GB/sec, GPU->CPU= 457.893 MB/sec
[   2097152 bytes] CPU->GPU=   4.194 GB/sec, GPU->CPU= 449.069 MB/sec
[   4194304 bytes] CPU->GPU=   4.280 GB/sec, GPU->CPU= 443.373 MB/sec
[   8388608 bytes] CPU->GPU=   4.215 GB/sec, GPU->CPU= 441.273 MB/sec
[  16777216 bytes] CPU->GPU=   4.226 GB/sec, GPU->CPU= 442.437 MB/sec
[  33554432 bytes] CPU->GPU=   4.067 GB/sec, GPU->CPU= 450.395 MB/sec
[  67108864 bytes] CPU->GPU=   4.067 GB/sec, GPU->CPU= 461.420 MB/sec
[ 134217728 bytes] CPU->GPU=   4.091 GB/sec, GPU->CPU= 480.207 MB/sec
[ 268435456 bytes] CPU->GPU=   4.123 GB/sec, GPU->CPU= 492.841 MB/sec
calResAllocLocal2D() returned an error when trying to allocate 536870912 bytes!
Peak CPU->GPU Bandwidth =   4.280 GB/sec [data size = 4194304 bytes]
Peak GPU->CPU Bandwidth = 655.360 MB/sec [data size = 32768 bytes]

 

Also look here: http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=130923&enterthread=y

0 Likes

Thank you very much for this one, really helpful for me.

Regards

0 Likes