1 of 1 people found this helpful
As far as I know there is no GPU ISA assembler per se.
You can see the generated assembly with the -save-temps option in the clBuildProgram. Which can be educational. And see the changes in the assembly based on different approaches in the code.
There are 2 types of architectures: VLIW (Northern islands, Evergreen) and GCN (Southern islands, Sea islands, GCN3)
The processor (T40E) you mentioned is a VLIW processor. It would be extremely hard to program by hand. I really like asm, but that's where I'd have no gain on the OpenCL->llvm->amd_il compiler chain at all. In VLIW there are a few simple instructions that the compiler can handle well, but they must be structured in a very complicated way (Because of the [Very Long Instruction Word] fashion, and also the Clauses).
The other archiutecture is GCN, this is the new one, unfortunately your processor is older than that. This is has a completely redesigned cool instruction set with a lot more possibilities than VLIW. And because of this it is more difficult to handle by an automatic compiler, so that's where human thinking can be better than machine thinking. For example as a human can have precise control over register usage. With OpenCL you'll need a lot of black magic to force the compiler to go below 128 regs and thus the kernel run almost 2x faster. Plus there are a few GCN features that aren't exposed in OpenCL yet. For example the global synch which is a barrier across all the compute units of the GPU, this is essential for real-time low latency signal processing.
To get a better understanding, I'd suggest to read the GCN ISA manual. -> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/07/AMD_GCN3_Instruction_Set_Architecture.pdf
(To be more compatible, compare it with the Southern Islands manual, that's what all GCN hardware will support.)
I see that you like cryptography, so let me show you my blog posts on implementing the GroestlCoin mining algorithm in GCN asm: https://realhet.wordpress.com/2015/01/04/implementing-groestl-hash-function-in-gcn-asm/
In this particular project OpenCL and ASM are going head to head: with 'proper' hardware (somehow the compiler chooses different optimization methods on almost identical hardware (280 vs 290))and compiler(basically the driver) the OpenCL version can run 2-3% faster. But when choosing the wrong compiler, the asm version can go 2x faster (simply by when the OpenCL kernel turned out to use more than 128 regs). Although in the asm version there is still an 5% opportunity, but it will be really painful to do it in asm, that's where OpenCL is more elegant with its arithmetic optimizer.
Thanks for your Replay, your Project is a very good starting point for me.