Good morning, as seems to be the practice of this community before getting to the problem I would like to introduce myself!
My name is Marco Giordano, I am a software engineer at Animal Logic, in Sydney currently working on high-performance software. I work in the VFX industry and worked on blockbusters like Avengers Age of Ultron (created and optimized the face rig of Ultron to run at acceptable speed in Maya, given the complex computation and crazy amount of polygons), also worked in the game industry at Frontier, on Elite Dangerous and so on.
I am a performance freak, I love c++ and I like to optimize code, get into the disassembly etc. I have been working 99% of my career with Cuda, but I am loving the new AMD platform and built myself a Ryzen 1700 and Radeon Rx 570 box to start to do my personal project on it. Currently, I would like to port some of my Cuda stuff on OpenCL and optimize for AMD hardware. I would like to do that leveraging clang for few reasons, first I want to leverage clang optimizations, second doing an offline build will let me know exactly what code I get, inspect the disassembly and do whatever needed to optimize, knowing that, that will be the code that will run.
I have been digging the web in order to find information about the clang-OpenCL-amd platform and I am confused on some points, and would love some help.
First of all, I found several resources to use clang to spit out several intermediate formats for different GPU platforms, like ptx or in the case of OpenCL spir-v or directly spir which is based on LLVM IR, the final compilation will still happen at the driver level. My first question would be, is there a way to actually get a binary file for my specific target rx570 and use that directly in my program without extra compilation happening? Although that might be a problem due to clang getting out of sync with the drivers?
I also found out about http://gpuopen.com/gaming-product/radeon-gpu-analyzer-rga/
Which sounds like is able to generate the kind of binary I need, but this is mainly an analysis tool.
Ideally, I would love to be able to compile front to end all my c++ and cl kernel using clang (5.0) and see exactly what I get out of the kernel compilation. Should I just limit myself to look at spirv ?
Should I Just compile c++ normally with clang, try to do my optimization using RGA and compile the kernel at runtime?
One thing I like about the open approach from AMD is an open ISA and would love to be able to leverage it!
A couple of clarification, I want to use OpenCL just for GPGPU not CPU, I am running on centos 7 with clang built from source 5.0.
I hope this was not too confusing!
Message was edited by Marco Giordano fixed typo