AnsweredAssumed Answered

PCIe transfer bandwidth for multi-GPU

Question asked by willsong on Jan 20, 2014
Latest reply on Jan 29, 2014 by Meteorhead

Hi,

 

We are currently testing out what kind of bandwidth we can achieve in OpenCL from a multi-GPU setup.  Our setup is Radeon HD 7990 (x 4) on dual CPU motherboard, SLES 11 sp2, AMD Catalyst driver v13.4 (beta) for Linux.

 

Through some testing, we have determined the following:

 

  • OpenCL runtime identifies 8 devices (0 to 7) - since the 7990 is a dual GPU
  • Device IDs 0 - 3 are "attached" to CPU 0
  • Device IDs 4 - 7 are "attached" to CPU 1

 

Our test simply transfers data from the host memory to the device memory.  We use a single context for all devices, and separate command queues for each device.  Each command queue is handled by a separate thread on the host side, i.e. the data is transferred to all devices concurrently.

 

Our tests show the following results:

 

  • Running the test on device IDs 0 and 7 only (i.e. attached to different CPUs) results in around 9.8 GB/s bandwidth on each device
    • We think this is a reasonable value, since the BufferBandwidth test in the given AMD samples results in similar values
  • Running the test on device IDs 0 and 1 only (i.e. same physical GPU, sharing the PCIe slot) results in around 6.0 GB/s bandwidth on each device
    • We think this is probably reasonable, as the dual GPU results in contention (is this a correct assumption?)
  • Running the test on device IDs 0 and 3 only (i.e. attached to the same CPU, but two different physical GPUs) results in around 6.3 GB/s bandwidth on each device
    • Increasing the number of devices (e.g. running the test on device IDs 0, 1, 3) results in even slower bandwidth
    • Since the GPUs do not share PCIe slots, we expected near full bandwidth from each device

 

We have the following questions:

 

  1. Is our assumption correct - in that a dual GPU card will result in roughly half the data transfer bandwidth for its two devices when running concurrently?
  2. Are our test results expected - GPUs attached to different CPUs can produce full bandwidth, but GPUs attached to the same CPU results in half the bandwidth?  Is this a hardware (motherboard) issue?

 

Any advice/comments would be very much appreciated.

Thanks!

Outcomes