Good morning, as seems to be the practice of this community before getting to the problem I would like to introduce myself!
My name is Marco Giordano, I am a software engineer at Animal Logic, in Sydney currently working on high-performance software. I work in the VFX industry and worked on blockbusters like Avengers Age of Ultron (created and optimized the face rig of Ultron to run at acceptable speed in Maya, given the complex computation and crazy amount of polygons), also worked in the game industry at Frontier, on Elite Dangerous and so on.
I am a performance freak, I love c++ and I like to optimize code, get into the disassembly etc. I have been working 99% of my career with Cuda, but I am loving the new AMD platform and built myself a Ryzen 1700 and Radeon Rx 570 box to start to do my personal project on it. Currently, I would like to port some of my Cuda stuff on OpenCL and optimize for AMD hardware. I would like to do that leveraging clang for few reasons, first I want to leverage clang optimizations, second doing an offline build will let me know exactly what code I get, inspect the disassembly and do whatever needed to optimize, knowing that, that will be the code that will run.
I have been digging the web in order to find information about the clang-OpenCL-amd platform and I am confused on some points, and would love some help.
First of all, I found several resources to use clang to spit out several intermediate formats for different GPU platforms, like ptx or in the case of OpenCL spir-v or directly spir which is based on LLVM IR, the final compilation will still happen at the driver level. My first question would be, is there a way to actually get a binary file for my specific target rx570 and use that directly in my program without extra compilation happening? Although that might be a problem due to clang getting out of sync with the drivers?
I also found out about http://gpuopen.com/gaming-product/radeon-gpu-analyzer-rga/
Which sounds like is able to generate the kind of binary I need, but this is mainly an analysis tool.
Ideally, I would love to be able to compile front to end all my c++ and cl kernel using clang (5.0) and see exactly what I get out of the kernel compilation. Should I just limit myself to look at spirv ?
Should I Just compile c++ normally with clang, try to do my optimization using RGA and compile the kernel at runtime?
One thing I like about the open approach from AMD is an open ISA and would love to be able to leverage it!
A couple of clarification, I want to use OpenCL just for GPGPU not CPU, I am running on centos 7 with clang built from source 5.0.
I hope this was not too confusing!
Message was edited by Marco Giordano fixed typo
I don't know much about clang-OpenCL-amd. But if you want to program AMD GPUs, ROCm & HIP is where the action is.
Moving forward our GPU language foundation ( OpenCL, HIP, and HCC) will support GCN Inline Assembly, Assembly, and Disassembler. Also, we have to we have been working on new native code generator based on LLVM with our newer stack. Removing the two-stage compilation as we did in the past. User Guide for AMDGPU Backend — LLVM 5 documentation Here is link to the GCN ISA manual for GFX8 device you're working with http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture...
The first phase of all this work start with the ROCm Software Platform with the first instantiation via our Opensource GPU Computing driver at ROCm, a New Era in GPU Computing
Also looks like your confusing Intermediate Language Formats with assembly, ( PTX, OpenCL SPIR-V, SPIR 1.2) while these are lower than OpenCL or CUDA language Syntax they do not give the level of control you have in Assembly. They are all really still Machine Independent ( PTX is closer like HSAIL but is still an abstract from the base machine instructions and designed for portability)
Hi, sorry for late reply, It was a long weekend with Monday as bank holiday here in Australia so I was out of town. Yes, I was reading the page for LLVM 5 I was not sure if would actually spit out machine code at the end. That looks exactly what I was looking for, I am really interested in the Rocm platform but to get started I would rather don't add too much meat on the fire, and keep it small then expand on the platform.
About the intermediate representation, sorry for explaining myself poorly, I know that the intermediate representation is, I have written some manual ptx myself, what I meant to say in my original question was, can I get the actual machine code that is going to run on the card or I have to settle with the intermediate representation.
I am currently waiting for an SSD PCIe for the Ryzen/Radeon box as soon as I get that I will be able to start experimenting, thank you for your help. Last questions, what is the considered go to Linux distribution when it comes to AMD? Ubuntu?
Regarding how to "get the actual machine code", this page might answer your questions.
It says "[hsaco] is a standard ELF file" ... "Hsaco stores the compiled GCN code in the .text section, it optionally contains debug information, ..." ... "the dissembler tool can disassemble hsaco files so you can see what is going on inside the kernel." ... "[use] the CLOC (CL Offline Compiler) tool to compile the CL kernel into the hsaco file"