I was impressed with the aspirations of AMD's open platform for GPGPU computing and recently installed ROCm and HCC but have had some problems with the installation and in compiling the HCC example applications.
Hardware: Rocminfo correctly identifies Intel Core i7-6700 CPU as Agent 1 and GPU gfx802 as Agent 2 with ISA 1 named as amdgcn-amd-amdhsa--gfx802. Clinfo also shows correct information, identifying AMD APP platform and GPU-based ISA for Tonga PRO GL [FirePro W7100], though notes that OpenCL shows as v1.2 but actually seems to support up to v2.1. The BIOS is set to use CPU-based displays and I have monitors on the DVI and HDMI outputs, with nothing connected to any of the graphics card display ports.
Software: Basic Ubuntu 16.04 with few additions - no other video software or drivers installed, only those that came with ROCm.
ROCm installation: Followed advice and ran sudo apt update, sudo apt dist-upgrade, sudo apt install libnuma-dev, then rebooted before getting rocm.gpg.key, adding rocm repository to /etc/apt/sources.list.d and installing rocm-dkms. Seemed to complete successfully and HelloWorld compiled and ran OK, after adding my username to video group.
HCC installation and build: cloned the hcc.git repository using git clone --recursive -b clang_tot_upgrade into tools folder and built from source using mkdir -p build; cd build, then cmake -DCMAKE_BUILD_TYPE=Release .., then make and make install. May have been some warnings or errors - I wasn't watching the screen all the time as it was quite a lengthy process but it at least completed without any error indication.
HCC-Example-Applications installation: Added /opt/rocm/bin to path, downloaded example files, created build sub-directory and ran CXX=hcc cmake .. from there, then make. This produced too many warnings and errors and stopped with 'Makefile:83: recipe for target 'all' failed'. Then tried to compile each example singly in turn using hcc `hcc-config -cxxflags --ldflags` example_name.cpp
HCC Examples common compilation issue: For all example applications, got the same output: clang-7: warning: -amdgpu-target argument 'gfx802' is not recognized; using gfx803 instead [- WInvalid-command-line-argument] and then several lines of: 'auto' is not a recognized processor for this target (ignoring processor). This is my main concern right now, that the platform/compiler does not actually support Tonga GPUs and is defaulting to a Fiji device.
HCC Examples other compilation issues: For MD, errors included: a) at line 31:5 in MD.cpp, variable length array declaration not allowed at file scope int dummy[DUMMY_NUM], b) at line 173:48 in hc_short_vector.inl, no type named 'type' in hc::short_vector::__vector<float, 4>' typedef typename __vector<SCALAR_TYPE_SIZE>::type type; (further indication also of problem at MD.cpp line 474:15 linked with instantiation of template class MD<float> mdf and similarly later at line 540:20 with instantiation of template classs MD<double> mdd).
For FFT there were various (presumably minor) macro redefinition warnings on constants like M_E but also too many errors to complete compilation, mostly on forward definitions of function calls, probably linked to short_vector/instantiation of template class issue mentioned above in connection with MD.
For BitonicSort there were several warnings of am_copy being deprecated and to use accelerator_view::copy instead but program did compile, however gave linker error with undefined reference to symbol 'hsa_memory_free@@ROCR_1' and libhsa-runtime64.so.1 error adding symbols: DSO missing from command line.
HCC Examples possible runtime issues: Examples SPMV, SyncVsAsyncArrayCopy and ArrayBandwidth (apart from the common clang-7 target warning mentioned above) did all compile, link and generate an executable, which all ran but gave the indication ### HCC STATUS_CHECK Error: HSA_STATUS_ERROR (0x1000) at file:mcwamp_hsa.cpp line:3655 Aborted (core dumped).
There are obviously several types and levels of issue at play here and I am not a Linux expert so there may be something simple that I did wrong or didn't do right but I did try and follow the installation and usage instructions to the letter and it is disappointing that for a major platform like ROCm/HCC on a fairly vanilla target, I could not get one example to compile and run without some error or other.
Any suggestions/comments, please?
Have done a bit more digging and perhaps found a clue to the GPU issue in HCC2. I installed this and ran the cloc/vector_copy example. This produced the same compiler warnings about unrecognized target and ###HCC STATUS_CHECK error when it ran.
The makefile uses the cloc.sh script which in turn calls mygpu to determine which gfx processor to use and if it cannot find one it defaults to Fiji (gfx803). Mygpu in turn looks for a kfdid in one of the /sys/devices/virtual/kfd/kfd/topology/nodes/*/properties files. There are 26 items of information in Node 1 properties file, one of which gives the same device_id number as ROCminfo shows for Agent 1Chip ID but there is no kfdid parameter.
I read that the ROCm HSA driver replaces the amdkfd driver but ROCm perhaps still relies on information provided by the previous driver.
So if this is indeed the issue, the question is can I put a valid kfdid value into the properties file manually (with a corresponding addition to the kfkid2code() function in mygpu as currently there are only entries for Vega and Fiji) without upsetting anything, or do I need to uninstall everything, install kfd driver then reinstall ROCm?
Maybe this may help a little from GITHUB: GitHub - RadeonOpenCompute/ROCm: ROCm - Open Source Platform for HPC and Ultrascale GPU Computing .
is this related to OpenGL or OpenCL type programming using Linux in a Server Environment? or a Workstation?
The hardware platform at the moment is a workstation PC format but probably will have a client/server structure at the application level. I may add another graphics card in the near future and upscale from that if I can get the software running properly. Ubuntu 16.04 does come in desktop and server editions but I hadn't thought that would be at issue at this stage in just getting the compiler to work correctly.
The platform notes on the GitHub page are quite general and at least for HCC2, the problem seems to be the same for all gfx80x range devices even Fiji, which the notes say is supported anyway.
This sound like these two forums may be better to ask your question: Devgurus and https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/amd-linux .
ROCm related support is provided on the GitHub site itself. Below are the links to report the issue: