cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

apfel
Adept I

How to use the GPU ISA?

Hi there,

i found a lot of questions to this topic and some answers. But not even one was clear enough.

Under this link i can find instructionset information for AMD GPUs:

http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/

So, how can i use this GPU instruktion set?

Are there example GPU assembly Programms anywhere?

Can is use this ISA also with the OpenSource Xorg driver and Kernel Module for Linux?

I like to use these things to get a better understanding and later i like to start some

computation projects, especially for cryptothings. I plan to use T40E APUs.

I know that i could use OpenGL for this, but it sounds somehow stupid that there

is now lowlevel way available.

It would be amazing to get some Informations on this topic.

BR

Simon

0 Likes
1 Solution
realhet
Miniboss

Hi,

There are 2 types of architectures: VLIW (Northern islands, Evergreen) and GCN (Southern islands, Sea islands, GCN3)

The processor (T40E) you mentioned is a VLIW processor. It would be extremely hard to program by hand. I really like asm, but that's where I'd have no gain on the OpenCL->llvm->amd_il compiler chain at all. In VLIW there are a few simple instructions that the compiler can handle well, but they must be structured in a very complicated way (Because of the [Very Long Instruction Word] fashion, and also the Clauses).

The other archiutecture is GCN, this is the new one, unfortunately your processor is older than that. This is has a completely redesigned cool instruction set with a lot more possibilities than VLIW. And because of this it is more difficult to handle by an automatic compiler, so that's where human thinking can be better than machine thinking. For example as a human can have precise control over register usage. With OpenCL you'll need a lot of black magic to force the compiler to go below 128 regs and thus the kernel run almost 2x faster. Plus there are a few GCN features that aren't exposed in OpenCL yet. For example the global synch which is a barrier across all the compute units of the GPU, this is essential for real-time low latency signal processing.

To get a better understanding, I'd suggest to read the GCN ISA manual. -> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/07/AMD_GCN3_Instruction_Set_Architecture...

(To be more compatible, compare it with the Southern Islands manual, that's what all GCN hardware will support.)

I see that you like cryptography, so let me show you my blog posts on implementing the GroestlCoin mining algorithm in GCN asm: https://realhet.wordpress.com/2015/01/04/implementing-groestl-hash-function-in-gcn-asm/

In this particular project OpenCL and ASM are going head to head: with 'proper' hardware (somehow the compiler chooses different optimization methods on almost identical hardware (280 vs 290))and compiler(basically the driver) the OpenCL version can run 2-3% faster. But when choosing the wrong compiler, the asm version can go 2x faster (simply by when the OpenCL kernel turned out to use more than 128 regs). Although in the asm version there is still an 5% opportunity, but it will be really painful to do it in asm, that's where OpenCL is more elegant with its arithmetic optimizer.

View solution in original post

3 Replies
jtrudeau
Staff

As far as I know there is no GPU ISA assembler per se.

You can see the generated assembly with the -save-temps option in the clBuildProgram. Which can be educational. And see the changes in the assembly based on different approaches in the code.

realhet
Miniboss

Hi,

There are 2 types of architectures: VLIW (Northern islands, Evergreen) and GCN (Southern islands, Sea islands, GCN3)

The processor (T40E) you mentioned is a VLIW processor. It would be extremely hard to program by hand. I really like asm, but that's where I'd have no gain on the OpenCL->llvm->amd_il compiler chain at all. In VLIW there are a few simple instructions that the compiler can handle well, but they must be structured in a very complicated way (Because of the [Very Long Instruction Word] fashion, and also the Clauses).

The other archiutecture is GCN, this is the new one, unfortunately your processor is older than that. This is has a completely redesigned cool instruction set with a lot more possibilities than VLIW. And because of this it is more difficult to handle by an automatic compiler, so that's where human thinking can be better than machine thinking. For example as a human can have precise control over register usage. With OpenCL you'll need a lot of black magic to force the compiler to go below 128 regs and thus the kernel run almost 2x faster. Plus there are a few GCN features that aren't exposed in OpenCL yet. For example the global synch which is a barrier across all the compute units of the GPU, this is essential for real-time low latency signal processing.

To get a better understanding, I'd suggest to read the GCN ISA manual. -> http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/07/AMD_GCN3_Instruction_Set_Architecture...

(To be more compatible, compare it with the Southern Islands manual, that's what all GCN hardware will support.)

I see that you like cryptography, so let me show you my blog posts on implementing the GroestlCoin mining algorithm in GCN asm: https://realhet.wordpress.com/2015/01/04/implementing-groestl-hash-function-in-gcn-asm/

In this particular project OpenCL and ASM are going head to head: with 'proper' hardware (somehow the compiler chooses different optimization methods on almost identical hardware (280 vs 290))and compiler(basically the driver) the OpenCL version can run 2-3% faster. But when choosing the wrong compiler, the asm version can go 2x faster (simply by when the OpenCL kernel turned out to use more than 128 regs). Although in the asm version there is still an 5% opportunity, but it will be really painful to do it in asm, that's where OpenCL is more elegant with its arithmetic optimizer.

Thanks for your Replay, your Project is a very good starting point for me.

0 Likes