cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Arakageeta
Journeyman III

Opteron 8-GPU systems?

Do Opteron systems w/ 8 double-wide GPUs exist?

I'm a grad student looking to purchase a many-GPU system for our research lap.  I'd like to maximize both CPU core and GPU count.  The best I can find that maximizes both of these is a Tyan Intel-based system (http://www.tyan.com/product_SKU_spec.aspx?ProductType=BB&pid=412&SKU=600000188), which supports 12 CPU cores and 8 double-wide GPUs.  Is there any sort of equivalent (or better) in the AMD world?  The best I can find is a SuperMicro system that can sport 4 double-wide GPUs.  I've read that a company called Aprius had planned to produce an 8-way GPU system, but it looks like the product may have been cancelled.


I would much prefer an Opteron-based system.  I believe it's memory/cache hierarchy may provide more deterministic program execution (something that is important in our research).


Any recommendations?  Please tell me these systems exist!

0 Likes
21 Replies

ATI drivers doesn't support more than 4 GPUs within single system. End of story.

0 Likes

Ignore this message. Forum was giving an error when I replied.

0 Likes

ATI drivers doesn't support more than 4 GPUs within single system. End of story.


This is not true. We have recently installed a 8-GPU (Firestream 9250) per node cluster on a customer site.

We are using 2 PCIe expansion system (each containing 4 GPUs) to add total 8 GPUs on one node.

0 Likes

What is the effect of using a PCI expansion box?  I presume it limits the memory/communication bandwidth available?

0 Likes

Originally posted by: gaurav.garg

 

This is not true. We have recently installed a 8-GPU (Firestream 9250) per node cluster on a customer site.

 

We are using 2 PCIe expansion system (each containing 4 GPUs) to add total 8 GPUs on one node.

 

Gaurav, did this actually work?  Runing FindNumDevices returns 8?  Can you give details of the Linux OS/kernel, ATI driver version, and other configuration details?

0 Likes

Yes, Gaurav, can you provide more details? If situation finally changed and 4+ GPUs supported by drivers it's definitely good news.

... Though support for 5970 is still under big question...

0 Likes

Following was the configuration:

OS: Red Hat 5.3 64-bit
ATI driver: Catalyst 10.4
CPU: Two Quad-Core AMD Opteron per node
Chipset: NVIDIA nForce Professional 3600 and 3050
32 GB RAM

0 Likes

Gaurav, did this actually work?  Runing FindNumDevices returns 8?  Can you give details of the Linux OS/kernel, ATI driver version, and other configuration details?


Yes, FindNumDevices as well as OpenCL showed 8 GPUs. We also ran a OpenCL program that used all 8 GPUs.

0 Likes

Hi Gaurav,

Can you give us a hint of which pci-expansion card/box do you use and which interface do you use to hook up the pc-expansion card to the system? And as other asked: do you lost bandwith in the expansion card?

Thanks,

Roto



0 Likes

Hi Gaurav,

Can you give us a hint of which pci-expansion card/box do you use and which interface do you use to hook up the pc-expansion card to the system? And as other asked: do you lost bandwith in the expansion card?

Thanks,

Roto

0 Likes

PCIe host adapter card is PCIe x16 2.0. The PCIe lanes are dynamically assigned to each GPU in expansion system. So, you will get full bandwidth as long as you are using a single GPU per expansion system. But, bandwidth is divided in case multiple GPUs are used.

0 Likes

@rotor
I'm only aware of a single vendor for an expansion box: One Stop Systems

Additionally, just because the host adapter card is PCIe x16 2.0, it doesn't mean the motherboard supports it.  Using the NVIDIA nForce Professional 3600 and 3050 chipset, you will have two slots at PCIe x16 1.0. or half the bandwidth per adapter.  The card is backwards compatible.  Just sayin'

@gaurav.garg
Was there a particular bootup process? X server configuration?  Special device permissions? Runlevel?

0 Likes

Thanks Jross,

I really like the OSS 2U GPU/SSD server of the One Stop. For there expansion system, if they hook up 4 GPUs over only 1 PCIe 16 lanes link, the bandwidth theoretically will be decreased 4 times if all 4 cards do transferring data at once. It's also worth to mention about the delay of long communication between the host and the expansion box at the initialization.

Back to the Nvidia chipset. If I use that chip set does it mean that I have to make a custom design mainboard to handle the chip set? I have known so far that the consumer-level workstation mother board now just have up to 3 PCIe 16 lanes which can handles up to only 3 GPUs.

Thanks,

Roto

 

0 Likes

@gaurav.garg
Was there a particular bootup process? X server configuration?  Special device permissions? Runlevel?


No, it was normal bootup without any hacks from ourside. We didn't configure x server manually, it was configured by aticonfig. We used default runlevel, 6.

We were initially installing ATI catalyst 10.2 and were facing similar issues with 8 GPUs that other users have posted on this forum. But, catalyst 10.4 got installed smoothly without any hacks from ourside.

0 Likes

Originally posted by: gaurav.garg PCIe host adapter card is PCIe x16 2.0. The PCIe lanes are dynamically assigned to each GPU in expansion system. So, you will get full bandwidth as long as you are using a single GPU per expansion system. But, bandwidth is divided in case multiple GPUs are used.

 

Thanks Gaugrav for the information.

So you used PCIe expansion box to handle the multiple GPUs then hook up the box through a PCIe host adapter? I think this is a good solution but if we aims at high bandwith applications, it would not be enough to make us happy

Roto

0 Likes

@rotor
PCIe would split 16 lanes by the number of GPUs.  Additionally, since it's PCIe 1.0, the bandwidth is decreased by two.  So each of the four boards would have 1/8th the bandwidth of a dedicated PCIe 2.0 x16 slot.  That may or may not be a problem depending on your application.  Look into gamer-grade motherboards instead of workstation-class for lots of PCIe slots.  Unfortunately, most of the information I've seen and personal experience suggests it's not very simple to build a functional system with more than 4 ATI GPUs.  There seem to be a lot of driver/kernel/BIOS hacks required to make it work and there's no single recipe out there.  Only a few people claim it works.  Details of those systems are very slim, and for the most part, I agree with empty_knapsacks first comment.  This sitation may change in the future, however.

0 Likes

Originally posted by: jross @rotor There seem to be a lot of driver/kernel/BIOS hacks required to make it work and there's no single recipe out there.  Only a few people claim it works.  Details of those systems are very slim, and for the most part, I agree with empty_knapsacks first comment.  This sitation may change in the future, however.


This url has some details on the driver/kernel/BIOS hacks

http://fastra2.ua.ac.be/?page_id=214

I'm hoping that a motherboard having a 64 bit EFI BIOS would resolve this by having an option to allocate PCIe address space above the 4Gb limit during startup.

Its possible that the I/O port space issue might need to be address by GPU designs that require less space.

 

0 Likes

Originally posted by: jross @rotor There seem to be a lot of driver/kernel/BIOS hacks required to make it work and there's no single recipe out there.  Only a few people claim it works.  Details of those systems are very slim, and for the most part, I agree with empty_knapsacks first comment.  This sitation may change in the future, however.


This url has some details on the driver/kernel/BIOS hacks

http://fastra2.ua.ac.be/?page_id=214

I'm hoping that a motherboard having a 64 bit EFI BIOS would resolve this by having an option to allocate PCIe address space above the 4Gb limit during startup.

Its possible that the I/O port space issue might need to be address by GPU designs that require less space.

0 Likes

There seem to be a lot of driver/kernel/BIOS hacks required to make it work and there's no single recipe out there.  Only a few people claim it works.  Details of those systems are very slim, and for the most part, I agree with empty_knapsacks first comment.  This sitation may change in the future, however.


This url has some details on the driver/kernel/BIOS hacks

http://fastra2.ua.ac.be/?page_id=214

I'm hoping that a motherboard having a 64 bit EFI BIOS would resolve this by having an option to allocate PCIe address space above the 4Gb limit during startup.

Its possible that the I/O port space issue might need to be address by GPU designs that require less space.

0 Likes

I was looking for the possibilities of building dense GPU clusters, and Cubix solutions seemed like a good choice (but most likely not the cheapest). The 4U extension supports 16 double wide GPUs with 8 connector cards. Given that a host machine with an MSI Big Bang Marshall or some other sort of motherboard with 8 x16 slots (may they work on whatever speed, most likely x8 when all are used) one could have very dense systems with very few host machines.

These would be ideal for applications where CPU calculations are minimal. It would really rock if AMD would implement the virtual memory space as in Cuda 4.0, where VRAM of GPUs can be easily shared accross devices and datacopy can be done without host intervention.

0 Likes
alxvry
Journeyman III

Arakgeeta,

Were you able to find any 4GPU Opteron Systems?  From what I understand, Tyan now has Opteron platforms with this capability:  http://www.tyan.com/product_SKU_spec.aspx?ProductType=MB&pid=687&SKU=600000213

0 Likes