I have been trying to find a decent GPU rack solution for over a year now, and I have to say it is very difficult with AMD cards.
Supermicro does have some decent solutions, which Tasp has mentioned, however each of them has a "weak" point. Their faults are not Supermicro specific, all vendors (I checked Supermicro, International Computer Concepts, One Stop Systems, Mellanox) have some point, which makes it too hard to create real dense clusters with them. Here are some of the possibilities I came across:
1U: Supermicro offers 1U solutions that house a half width dual Intel Xeon motherboard with two full profile double width PCIe expansions, and one low profile single width expansion. This is the densest system I encountered (2 graphics cards per 1U, having dual GPU cards, 4GPU/1U). It also has in addition one low profile expansion, which can be used for infiniband, or house a low profile AMD GPU for rendering desktop (to be default adapter for XServer, which is still an issue, that desktop has to be rendered for the drivers to load). However having 2 server processors and heaps of RAM capacity for only 2 cards (or 4 GPUs) is a waste of funds. One server processor could serve 6-12 GPU cores easily, taking that the machine is used for GPU heavy computing.
1U + 1U: I thought of the same thing Tasp has mentioned, to have some similar setting as most Tesla clusters have. To use infiniband to connect GPUs that are not inside the same 1U housing as the host machine. One of our German partners has systems like this, two half width motherboards in 1U, and 4 Tesla cards in another 1U rackmount. That is again 4 cards / 2U, same density, but even more costly, not just two processors are needed, but two motherboards as well.
3U: This is our current setup, we have a 3U rackmount and 3 5970 cards inside. Supermicro has some similar solutions with 4 double width GPUs inside a 3U rackmount. This is not as dense as prior solutions and in my experience, it does not work. When serious cards (such as 5970) are installed side-by-side as in a normal Tower case, they fry each other. Cards take air in from the back and from the ventillator side, which is almost in touch with the screaming hot backside of the neighbouring card. With long stresses, the middle card overheats and shuts down. This is not AMD issue, other computing groups in our center that have 4 NV cards similarily installed complain about the same issue. 1U installation geometry suits better for cooling.
3U + cooling: I had the idea of keeping 3U housing, but using watercooling for the cards. Water cooling for GPUs do not require double with expansions, most coolers have their water tubes inserted upward, not towards the neighbouring cards. Unfortunately dual GPU cards have so many video outlets, that they still need double width expansion. (Oh, bugger...) Even if I installed a decent water cooling system (let's say in another 1U) the cluster would not become any denser, infact it would become less dense because of the cooling. But the cards would not fry each other, but yet again, costs increase by a lot. Watercooling is not cheap, specially on larger scales.
3U + 2*1U: This would seem to be the most flexible solution, however I cannot seem to find all the neccesary components to build such a system. In this setup I would have 3U rackmount for the host machine with a decent motherboard (ASUS P6T7 WS Supercomputer, or MSI BigBang Marshal) and hook up as many GPUs as possible via infiniband or PCIe expansions. The MSI board uses LucidLogix chip, but I'm 90% sure one could turn off the chip to have 8*PCIex8 expansions. That way one could have 8*2 = 16 GPUs in 3+2*1=5U. It is not as dense as the 1U solution, but should be cheaper and has the advantage of being able to control more GPUs per node, scaling should fall slower than in other, less "chunky" clusters. This solution would be great, but I can't seem to find all the components to be able to set up something stable, as the Tesla 1U expansions. I'm a programmer, and I do not intend on fabricating 1U racks to hold 4 cards properly, and then go and find Power Supplies that fit into the housing. I'm surprised nobody offers (Supermicro or alike) similar solutions.
I am aware that I have only mentioned Intel chipset motherboards, but that is only because I am unable to find capable AMD motherboards. They all lack PCIe expansions. At most they have 1, rarely 2. But even the ones that have two, are XL-ATX and similar sized boards, while on the "blue side of the force" there are half width motherboards with 2*PCIex16 connectors. I do not see how I could even begin to consider using AMD motherboards for GPU clusters. I was kinda' hoping that with AMD waving "The future is Fusion" flag, they would somehow start to offer products that show this tendency of bringing gpu clusters closer to reality.
I would like to point out John Fruehe's post at the end, commenting the Fusion approach to server computing. He states Fusion is not (just) about bringing APUs into HPC clusters (although most would hail at this achievement), so it is not just bringing the two processing elements onto the same die, but to bring the new idea of heterogeneous computing into everyday clusters. I fail to see this tendency from AMD.
If anybody has any constructive comments, please feel free to enlighten me (and all other forumers).