This is how they're run in a data center:
If they are throttling, something to look into would be high CFM fans and case air flow.
For AI/ML deep learning 10G isn't enough, you want at least 40G or QDR Infiniband with RDMA.
I did consider the Threadripper platform but our current platform gives us x8/x8/x8/x8 and we can build 3~4 systems for the cost of 1 TR system.
Back to this thread - I was able to get in touch with AMD support last night and found the kernel and Xorg hard requirements:
kernel (4.10.0-33) and Xorg versions(1.19.3)
which can be installed on Ubuntu 16.04.3 with:
sudo apt install --install-recommends linux-generic-hwe-16.04 xserver-xorg-hwe-16.04
I'll try amdgpu-pro with ROCm and legacy OpenCL after a clean install.
Yep, I had to use Xorg 1.19.3, and I did try Kernel 4.10 (and 4.13 and 4.4): no dice, I just lack the CPU/motherboard support for ROCm, and AMD drivers do not offer OpenCL (legacy) for this card, as far as I can tell it is ROCm only.
Let me know how your test goes!
Well, I am just a guy building a system for his personal experiments (I work as a sysadmin/dba mostly), so... not that I plan to build a big AI/ML cluster anytime soon... it would seriously complicate my choices and probably increase my costs (and, for now, this is just a hobby).
Seriously? 3~4 systems for the price of 1 TR??? I have considered the Intel path, but it would save me like 30% or so while providing me like 50% less performance overall (CPU side of things)... What did you use to get PCIe x8/x8/x8/x8 for $500~$700??? (including CPU(s), motherboard, RAM, case/chassis, fans, power supply(ies)).
Now, back to cooling: this card doesn't have an opening on the short side to allow for air coming from the case fans to go to the card's fins... I am no expert, but... how does it get enough air? the air path would go between one card and the other (both hot), and into the turbine, from the turbine along to rest of the card... unless those fans created a high pressure system within the case that forced air in the small gap between the cards at high speed (or maybe the 4 small threaded holes?), I have my doubts the cards will work cool I mean, those are 300W beasts. I took a look at other cards, including nVidia's Quadro and Titan Xp and they do have fins exposed on the short side where those fans would blow directly at, the same goes for the Tesla P100 (and similar), except these, being server-only cards, doesn't have a fan, just the fins where the chasis' fans would blow at. I couldn't find an example of server-only card from AMD, WX5100/7100 and 9100 all three have fans, which makes me believe these are more workstation cards.
Oh, found them from AMD (and the MI25 looks nice):
https://instinct.radeon.com/en/product/mi/radeon-instinct-mi25/ (well, this is yet to be released, it seems)
These also do not have a fan and the power connector is on the short side for the card.
In the picture posted previously those are Vega Frontier Editions.
I'm not an expert on airflow but there is a lid on this server and it is a wind tunnel with air moving from front to back. Blower cards can pull air from that small gap and use it for cooling. It may not be ideal, but with proper case fans it is enough.
A smoke test would be the best way to visualize this. You can also visualize this by putting a string by the intake and seeing how it is pulled in. By putting my hand near the fan intake, there is no breeze directly over the center but there is a force on the edges, almost parallel to the card.
The Radeon Instinct MI25 cards are likely better cooled due to the open back and specific design to take advantage of the server "wind tunnel" like you mentioned.
I haven't had a chance to retest bare linux as someone came up with a neat solution of running the cards in a virtualized instance with GPU pass through and then using them that way. This may be the best option until verified linux drivers are released, unless you have time to spend debugging.