I'd like to know what software stack people should use to use HSA. I've been following the development at HSA Foundation · GitHub
Now I hear about HSAKMT (AMD Does First Release Of HSAKMT Library As Part Of Open-Source HSA - Phoronix) and I'm not sure where that fits in the stack.
Now that HSA 1.0 compliant hardware is out and HSA software seems to be maturing, does AMD plan to release a set of software that will work together so that people who want to use HSA can get all they need from a single place? I have tried various HSA software stacks (GCC, Numba, C++AMP) and it requires that I get different pieces of software from different places and involves tweaking/copying/moving files to get them to work and the documentation are not always up to date.
It seems that every software package is trying to do a release this fall or have released a version during the past couple of month. It would be nice to be able to download set of *.deb files from a single place or set up HSA software following a single document. "Getting started with HSA" or something like that would be helpful. I see that there is a docker container for HSA but it is 8 month old. Maybe it's time for an updated version?
Quick answer from phone, will check back when I get to a real PC.
The libhsakmt component has always been around, usually in the HSA-Drivers-Linux-AMD folder.
Anyways, all you need is contents of drivers and runtime folders. Believe the runtime instructions assume you installed drivers first.
I see that libhsakmt has always been part of HSA-Drivers-Linux-AMD. I suppose what was released, must be for the upstream kernel as oppose to kernel from github/hsafoundataion. I still would like to see some more information about HSA software stack's current status and roadmap. I know that github release is not compatible with upstream kernel. Does AMD plan to sync up github release with upstream release soon, once all the components for the upstream kernel is released?
This is a bit off topic. But, during the financial analyst day, it was officially stated that AMD will release support for Caffe and Torch7 this year (I think the target was Q3 but not sure). Is that still going to happen? I see a pull request for Caffe OpenCL from AMD. But I could not find anything related to Torch7. Also, how about support for Theano?
Right... upstream thunk (libhsakmt) works with upstream kernel.
The github kernel releases include some code that wouldn't be accepted upstream in its current form (eg local memory support for Kaveri, which allows pinning from userspace, and system/local memory support for dGPU) so there'll have to be a delta until we can replace that code with something upstream-acceptable. The core APU code, which uses IOMMUv2 for GPU access to unpinned system memory, has no such problem.
I'm not sure what the timing is for specific framework support -- on the weekend I should have a chance to catch up on exactly what we announced at SC15
Even if the "gitkub kernel" is not quite upstream ready, it would be great if it was updated and re-based on a more current version, 4.0 is getting old. Telling users to downgrade their kernel is often a no-go, especially when they use modern features that had many bugs fixed in more recent versions. (And it would be great if it was distributed as a kernel source git tree, not the source dump you currently have.) Perhaps even more importantly, the finalizer still needs to be open sourced so that it can be packaged for more distribuitons. You know, not everybody is running Ubuntu.
Does AMD plan to release updated KFD driver (HSAFoundation/HSA-Drivers-Linux-AMD: This repository contains binary images for AMD's HSA Linux kern... ) soon? I'd like to try GPUOpen compute stuff with Fiji dGPU and I don't think the current driver supports that, since it will need newer AMDGPU stuff, right?
Also, for those people who have Intel platform with Fiji, which kernel/drive should they use? Kernel driver information is missing from GPUOpen.com and Multicoreware git repository (hcBLAS Documentaton — HCBLAS documentation ) mentions Boltzmann Early Release Driver/HSA driver but does not have direct link.
Yes, new code has been uploaded and we're writing the README as we speak. New location will be https://github.com/RadeonOpenCompute/ with repos for KFD, thunk and runtime. The repos are still private at the moment but will be public today.
The thunk and KFD folders are both source trees with commit history - KFD is based on kernel 4.1 at the moment - and we added a single commit to the kernel repo adding binary packages for both thunk and kernel. You'll need this new code for Intel/Fiji.
I see the kernel driver release. What I find surprising is that there is only 1 driver for both AMD HSA platform and Intel Haswell platform and only 1 set of setup instruction. When you install the kernel on a supported Intel platform with AMD GPU, do you get /dev/kfd? I always thought that AMDKFD was specific for AMD HSA implementation. I did not expect to see that on Intel platform.
Is AMD going to include Intel + AMD dGPU as HSA compliant platform? Is that what HSA+ is referring to?
Also, if you install Fiji on Kaveri, can you use both iGPU and dGPU? Is Fiji used by default?
Yes, the /dev/kfd device appears whenever you have a supported configuration with an AMD GPU, whether it be APU or dGPU (or a server full of dGPUs). The KFD code is specific to AMD GPUs but we don't care about x86 CPU details (although KFD won't support compute operations on the Intel iGPU).
I forget the exact terminology, but there are two models in the HSA standards and IIRC the dGPU configuration should support the lower one (or at least what the lower one is evolving towards). Not sure about that though, let me get back to you.
The topology code should expose both Kaveri and Fiji dGPUs but initial testing focus has been on Intel CPU plus one or more Fiji dGPUs.
Thank you. I have one more question.
I thought that the main advantage of HSA was
1. Provide flat memory addressing (a pointer is a pointer).
2. Provide Unified Memory Architecture to eliminate copying.
3. Reduce kernel launch overhead using AQL & user space queue.
4. bunch of other stuff...
When using dGPU, 1 & 2 won't apply. hcc seems to be solving those problems using array_view. My questions are
1. How is kernel launch overhead compare to old OpenCL method. Does it still make multiple system calls even when using hcc grid launch with dGPU?
2. Will AMD's HPC APU bring back 1 & 2 when they are released in 2016/17?
Thank you for your answers and I'm excited to try out the new release!
I haven't spent much time looking at what HCC is doing, but for the rest of the stack :
#1 still applies - any allocation made through APIs (rather than OS-allocated memory) will have the same address for GPU and CPU, although with the current code device memory allocations are not CPU-accessible. Host (system memory) allocations are pointer-is-a-pointer.
#2 doesn't really apply, simply because a high end dGPU can use much more memory bandwidth than you can pass across a PCIE or PCIE-derived bus so one way or another (manual or automatic) you end up copying. That said, we have enough DMA and compute shader bandwidth on the high end dGPUs that copying can often be "pipelined out of sight" but that doesn't give you the nice simple programming model of an APU. Yet.
#3 still applies, no change -- on our hardware you get user queues and AQL. No system calls for a kernel launch, just write to user queue and then write to the doorbell, all in userspace.
#4 the other main thing you don't get with dGPU is cache coherence, although what we found was that most people would cheerfully trade off cache coherence in exchange for a bit more performance. At the moment GPUs can snoop CPU cache, even over the PCIE bus, but CPUs can't snoop GPU cache.
Can't really talk about future products much, sorry.