PCIE performance becomes important when running multiple high end cards like the new 7970, which when over clocked, runs at almost 5 teraflops.
Trying to optimize a kernel, I discovered that my PCIE bus is limited to 1.6GB/s, read and write, where it should be about 5-6 GB/s in a v2.0 x16 slot. I've tried several GPUs, one at a time, in every slot and always get the same numbers. I also updated and main board bios and drivers, AMD drivers, and tried every BIOS configuration, the whole works.
I get identical numbers from programs like PCIeBandwidth, PCIspeedtest(v0.2), and my own code using all the suggested methods from the AMD APP Opencl Programming Guide (Dec 2011) for fast path transfers (a good read). The numbers I get are:
PCIe x4 slot, transfer rate=1.40 GB/s (read and write) (one card)
PCIe x16at x8, transfer rate=1.65 GB/s (read and write) (requires 2 cards)
PCIe x16 at x16, transfer rate=1.65 GB/s (read and write) (one card)
Also note the 1.40GB/s rate for the x4 slot is correct, extrapolated to x16 it would be 5.6GB/s. The x16 slots are faster but not by much. According to GPU-z, the x16 slot is running at x16 v2.0 pcie mode.
PCI problems can be due to a combination of factors, but I doubt a pure hardware problem because I've tried 6870, 6970, and 7970 GPUs, and because I'm using a new top end main board specifically designed for high PCIE performance with 3 way Crossfire (ASRock z68-Extreme7), with 5 PCIE slots, one for PCIE v3.0, 3 for multi GPUs (x16, x16, 0) or (x16, x8, x8) and a dedicated x4 slot. It also uses PLX PEX8608 and NL200 chips to increase PCI lanes.
I'm currently using the new 12.3 drivers (dated Feb 16, 2012).
I've worked with PCIE before and know how complex these problems can be. Any help or feedback is greatly appreciated, particularly recent measurements of PCIE bus performance. Any help from AMD devgurus is also welcome (of course).
I will add anything useful that I learn in this thread.
Crossfire is not connected or selected.
GPU-z says the cards are running x16 v2.0 when in x16 slots.
GPU-z reports 7970s as x16 V3.0 running at x16-v1.1. When the GPU is loaded it switches to run at x16-v2.0. This does not affect the PCIE low performance problem.