21 Replies Latest reply on Mar 23, 2011 9:14 AM by Meteorhead

    Opteron 8-GPU systems?

    Arakageeta
      Do Opteron systems w/ 8 double-wide GPUs exist?

      I'm a grad student looking to purchase a many-GPU system for our research lap.  I'd like to maximize both CPU core and GPU count.  The best I can find that maximizes both of these is a Tyan Intel-based system (http://www.tyan.com/product_SKU_spec.aspx?ProductType=BB&pid=412&SKU=600000188), which supports 12 CPU cores and 8 double-wide GPUs.  Is there any sort of equivalent (or better) in the AMD world?  The best I can find is a SuperMicro system that can sport 4 double-wide GPUs.  I've read that a company called Aprius had planned to produce an 8-way GPU system, but it looks like the product may have been cancelled.


      I would much prefer an Opteron-based system.  I believe it's memory/cache hierarchy may provide more deterministic program execution (something that is important in our research).


      Any recommendations?  Please tell me these systems exist!

        • Opteron 8-GPU systems?
          empty_knapsack

          ATI drivers doesn't support more than 4 GPUs within single system. End of story.

            • Opteron 8-GPU systems?
              gaurav.garg

              Ignore this message. Forum was giving an error when I replied.

              • Opteron 8-GPU systems?
                gaurav.garg

                 

                ATI drivers doesn't support more than 4 GPUs within single system. End of story.


                This is not true. We have recently installed a 8-GPU (Firestream 9250) per node cluster on a customer site.

                We are using 2 PCIe expansion system (each containing 4 GPUs) to add total 8 GPUs on one node.

                  • Opteron 8-GPU systems?
                    Arakageeta

                    What is the effect of using a PCI expansion box?  I presume it limits the memory/communication bandwidth available?

                    • Opteron 8-GPU systems?
                      jross

                       

                      Originally posted by: gaurav.garg

                       

                      This is not true. We have recently installed a 8-GPU (Firestream 9250) per node cluster on a customer site.

                       

                      We are using 2 PCIe expansion system (each containing 4 GPUs) to add total 8 GPUs on one node.

                       

                      Gaurav, did this actually work?  Runing FindNumDevices returns 8?  Can you give details of the Linux OS/kernel, ATI driver version, and other configuration details?

                        • Opteron 8-GPU systems?
                          empty_knapsack

                          Yes, Gaurav, can you provide more details? If situation finally changed and 4+ GPUs supported by drivers it's definitely good news.

                          ... Though support for 5970 is still under big question...

                          • Opteron 8-GPU systems?
                            gaurav.garg

                             

                            Gaurav, did this actually work?  Runing FindNumDevices returns 8?  Can you give details of the Linux OS/kernel, ATI driver version, and other configuration details?


                            Yes, FindNumDevices as well as OpenCL showed 8 GPUs. We also ran a OpenCL program that used all 8 GPUs.

                              • Opteron 8-GPU systems?
                                rotor

                                 

                                Hi Gaurav,

                                Can you give us a hint of which pci-expansion card/box do you use and which interface do you use to hook up the pc-expansion card to the system? And as other asked: do you lost bandwith in the expansion card?

                                Thanks,

                                Roto



                                • Opteron 8-GPU systems?
                                  rotor

                                  Hi Gaurav,

                                  Can you give us a hint of which pci-expansion card/box do you use and which interface do you use to hook up the pc-expansion card to the system? And as other asked: do you lost bandwith in the expansion card?

                                  Thanks,

                                  Roto

                                    • Opteron 8-GPU systems?
                                      gaurav.garg

                                      PCIe host adapter card is PCIe x16 2.0. The PCIe lanes are dynamically assigned to each GPU in expansion system. So, you will get full bandwidth as long as you are using a single GPU per expansion system. But, bandwidth is divided in case multiple GPUs are used.

                                        • Opteron 8-GPU systems?
                                          jross

                                          @rotor
                                          I'm only aware of a single vendor for an expansion box: One Stop Systems

                                          Additionally, just because the host adapter card is PCIe x16 2.0, it doesn't mean the motherboard supports it.  Using the NVIDIA nForce Professional 3600 and 3050 chipset, you will have two slots at PCIe x16 1.0. or half the bandwidth per adapter.  The card is backwards compatible.  Just sayin'

                                          @gaurav.garg
                                          Was there a particular bootup process? X server configuration?  Special device permissions? Runlevel?

                                            • Opteron 8-GPU systems?
                                              rotor

                                              Thanks Jross,

                                              I really like the OSS 2U GPU/SSD server of the One Stop. For there expansion system, if they hook up 4 GPUs over only 1 PCIe 16 lanes link, the bandwidth theoretically will be decreased 4 times if all 4 cards do transferring data at once. It's also worth to mention about the delay of long communication between the host and the expansion box at the initialization.

                                              Back to the Nvidia chipset. If I use that chip set does it mean that I have to make a custom design mainboard to handle the chip set? I have known so far that the consumer-level workstation mother board now just have up to 3 PCIe 16 lanes which can handles up to only 3 GPUs.

                                              Thanks,

                                              Roto

                                               

                                              • Opteron 8-GPU systems?
                                                gaurav.garg

                                                 

                                                @gaurav.garg
                                                Was there a particular bootup process? X server configuration?  Special device permissions? Runlevel?


                                                No, it was normal bootup without any hacks from ourside. We didn't configure x server manually, it was configured by aticonfig. We used default runlevel, 6.

                                                We were initially installing ATI catalyst 10.2 and were facing similar issues with 8 GPUs that other users have posted on this forum. But, catalyst 10.4 got installed smoothly without any hacks from ourside.

                                              • Opteron 8-GPU systems?
                                                rotor

                                                 

                                                Originally posted by: gaurav.garg PCIe host adapter card is PCIe x16 2.0. The PCIe lanes are dynamically assigned to each GPU in expansion system. So, you will get full bandwidth as long as you are using a single GPU per expansion system. But, bandwidth is divided in case multiple GPUs are used.

                                                 

                                                Thanks Gaugrav for the information.

                                                So you used PCIe expansion box to handle the multiple GPUs then hook up the box through a PCIe host adapter? I think this is a good solution but if we aims at high bandwith applications, it would not be enough to make us happy

                                                Roto

                                                  • Opteron 8-GPU systems?
                                                    jross

                                                    @rotor
                                                    PCIe would split 16 lanes by the number of GPUs.  Additionally, since it's PCIe 1.0, the bandwidth is decreased by two.  So each of the four boards would have 1/8th the bandwidth of a dedicated PCIe 2.0 x16 slot.  That may or may not be a problem depending on your application.  Look into gamer-grade motherboards instead of workstation-class for lots of PCIe slots.  Unfortunately, most of the information I've seen and personal experience suggests it's not very simple to build a functional system with more than 4 ATI GPUs.  There seem to be a lot of driver/kernel/BIOS hacks required to make it work and there's no single recipe out there.  Only a few people claim it works.  Details of those systems are very slim, and for the most part, I agree with empty_knapsacks first comment.  This sitation may change in the future, however.

                                                      • Opteron 8-GPU systems?
                                                        moozoo

                                                         

                                                        Originally posted by: jross @rotor There seem to be a lot of driver/kernel/BIOS hacks required to make it work and there's no single recipe out there.  Only a few people claim it works.  Details of those systems are very slim, and for the most part, I agree with empty_knapsacks first comment.  This sitation may change in the future, however.


                                                        This url has some details on the driver/kernel/BIOS hacks

                                                        http://fastra2.ua.ac.be/?page_id=214

                                                        I'm hoping that a motherboard having a 64 bit EFI BIOS would resolve this by having an option to allocate PCIe address space above the 4Gb limit during startup.

                                                        Its possible that the I/O port space issue might need to be address by GPU designs that require less space.

                                                         

                                                        • Opteron 8-GPU systems?
                                                          moozoo

                                                           

                                                          Originally posted by: jross @rotor There seem to be a lot of driver/kernel/BIOS hacks required to make it work and there's no single recipe out there.  Only a few people claim it works.  Details of those systems are very slim, and for the most part, I agree with empty_knapsacks first comment.  This sitation may change in the future, however.


                                                          This url has some details on the driver/kernel/BIOS hacks

                                                          http://fastra2.ua.ac.be/?page_id=214

                                                          I'm hoping that a motherboard having a 64 bit EFI BIOS would resolve this by having an option to allocate PCIe address space above the 4Gb limit during startup.

                                                          Its possible that the I/O port space issue might need to be address by GPU designs that require less space.

                                                          • Opteron 8-GPU systems?
                                                            moozoo

                                                             

                                                            There seem to be a lot of driver/kernel/BIOS hacks required to make it work and there's no single recipe out there.  Only a few people claim it works.  Details of those systems are very slim, and for the most part, I agree with empty_knapsacks first comment.  This sitation may change in the future, however.


                                                            This url has some details on the driver/kernel/BIOS hacks

                                                            http://fastra2.ua.ac.be/?page_id=214

                                                            I'm hoping that a motherboard having a 64 bit EFI BIOS would resolve this by having an option to allocate PCIe address space above the 4Gb limit during startup.

                                                            Its possible that the I/O port space issue might need to be address by GPU designs that require less space.

                                                              • Opteron 8-GPU systems?
                                                                Meteorhead

                                                                I was looking for the possibilities of building dense GPU clusters, and Cubix solutions seemed like a good choice (but most likely not the cheapest). The 4U extension supports 16 double wide GPUs with 8 connector cards. Given that a host machine with an MSI Big Bang Marshall or some other sort of motherboard with 8 x16 slots (may they work on whatever speed, most likely x8 when all are used) one could have very dense systems with very few host machines.

                                                                These would be ideal for applications where CPU calculations are minimal. It would really rock if AMD would implement the virtual memory space as in Cuda 4.0, where VRAM of GPUs can be easily shared accross devices and datacopy can be done without host intervention.

                                              • Opteron 8-GPU systems?
                                                alxvry

                                                Arakgeeta,

                                                Were you able to find any 4GPU Opteron Systems?  From what I understand, Tyan now has Opteron platforms with this capability:  http://www.tyan.com/product_SKU_spec.aspx?ProductType=MB&pid=687&SKU=600000213