cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

realhet
Miniboss

Why GCN3 ISA is so incompatible with the previous version?

Hi,

s_load_buffer changed from 32 to 64 bits, so it supports a bigger offset, that's reasonable.

But why rearrange the flag bits in the MUBUF encoding?

Also in VOP3a, only the clamp bit is repositioned.

And most interestingly VOP2 opcodes are changed as well. o.O (I haven't seen the other opcodes yet, but I can guess they changed as well.)

Is this incompatibility has a higher purpose? Less transistor count in the chip for example?

For the assembler it is not that good, as now it has to support both the old and the new ISA. Including Evergreen there is now 3 different ISAs to produce in total.

(Also the disassembler still not works on ISA-only elf files, but I guess it is for the protection of kernel files to make them harder to reverse-engineer.)

On the hardware side, it's a great new gpu. I always find some new stuff, like now I've found a real time clock (s_memrealtime), and a special 16bit shl thingy.

1 Solution
matszpk
Adept III

Unfortunatelly GCN1.2 (GCN 3 gen) is not compatible bit-by-bit with older GCN architectures. Many (maybe almost) opcodes has been changed, SMRD encoding has been replaced by SMEM (now GCN1.2 allow to read and write to scalar memory). These are part of the changes between GCN1.0/1.1 gen and GCN1.2 arch.

Nevertheless, couple months ago, I wrote a working disassembler that supports GCN1.2. Now, I am finishing assembler will support GCN1.2.

I don't know, whether new documentation is correct in the almost cases. While writing disassembler I based on the reverse-engineering of the builtin disassembler from Catalyst drivers. I didn't verify my stuff in 100% percent, because I don't have access to Fiji or Tonga GPU's.

However, maybe my work will be useful for people which want to learn about new GCN architecture.

I refer to my work: CLRadeonExtender . This work contains the OpenCL (Catalyst and GalliumCompute) binaries generator, an assembler (still unfinished) and a disassembler and other stuff.

I am using the Linux (OpenSUSE), so my project was tested on the Linux systems.

View solution in original post

0 Likes
27 Replies
matszpk
Adept III

Unfortunatelly GCN1.2 (GCN 3 gen) is not compatible bit-by-bit with older GCN architectures. Many (maybe almost) opcodes has been changed, SMRD encoding has been replaced by SMEM (now GCN1.2 allow to read and write to scalar memory). These are part of the changes between GCN1.0/1.1 gen and GCN1.2 arch.

Nevertheless, couple months ago, I wrote a working disassembler that supports GCN1.2. Now, I am finishing assembler will support GCN1.2.

I don't know, whether new documentation is correct in the almost cases. While writing disassembler I based on the reverse-engineering of the builtin disassembler from Catalyst drivers. I didn't verify my stuff in 100% percent, because I don't have access to Fiji or Tonga GPU's.

However, maybe my work will be useful for people which want to learn about new GCN architecture.

I refer to my work: CLRadeonExtender . This work contains the OpenCL (Catalyst and GalliumCompute) binaries generator, an assembler (still unfinished) and a disassembler and other stuff.

I am using the Linux (OpenSUSE), so my project was tested on the Linux systems.

0 Likes
maxdz8
Elite

realhet:

On the hardware side, it's a great new gpu. I always find some new stuff, like now I've found a real time clock (s_memrealtime), and a special 16bit shl thingy.

I wish this stuff could get exposed as extensions some day!

0 Likes

I think in a way like: "if the mountain won't come to Mohammad... I'll extend my asm compiler to do high level stuff. I have that pascal/c script parser already, and I want to do simple functions that can be inlined into asm code.

function add(a, b:integer):integer;

begin

  result:=a+b;

end;

And it will be called something like this:

inline add, v[16], v[16], s[20]

So I have the input registers, and a parsed function in a form of an expression tree. And based on these, I must produce this asm code:

v_add_b32 v[16], vcc, v[16], s[20]

The new compiler will have to handle different types and s/v regs properly. And testing will be a lot easier because the script can be tested on the cpu as well.

realhet
Miniboss

Thx for the info!

I got green light in a GCN3 project, and will be able to test on a Fury.

So I'm planning to compile your disassembler on Qt/win and use it as a command line tool. This will make things easier and I will report errors if any.

0 Likes
sunsetquest
Adept II

I also noticed that it changed. I was looking at the "Differences Between GCN Generation 2 and 3 Devices" in the GCN3 manual and noticed the different formats. This is going to be a lot of work. For the ASM4GCN project, I'll probably just switch to the new format and instead of keeping a dual system. The old version can be used use for GCN 1.0 - 1.2 and the new version will be for version 3. 

As for the why AMD changed the format. I am guessing AMD's vision of the new features/changes they wanted no longer fit well with the bit structure they had. The new chip seems to add a lot more features then take away. Like on the VOP3A instruction now supports NEG on 4 items instead of 3 (so it needs 4 bits instead of 3). Also VOPA supports 664 instructions(10 bits) where as before it only had to support 452 (9 bits). It is cool that AMD went so long without changing their format.  I believe NVidia's SASS changes much more often. 

IT would be nice if AMD could make even more ISA documents available like spreadsheets and stuff so I don't have to pull it out of the ISA manual all over again.

But then how can you switch between the 2 versions automatically based on the actual hardware?

I'll try to mimic the new GCN3 things on GCN12. For example on bot there will be byte_permute, but if the target is GCN12, I'll emulate it with larger amount of produced machine code.

I think for the memory operations the Untyped memory operations could be used on GCN12 as well. And somehow I should emulate SMEM on the GCN12 too.

Lots of work

"IT would be nice if AMD could make even more ISA documents available like spreadsheets and stuff so I don't have to pull it out of the ISA manual all over again."

Thanks for @matszpk, we can use his instruction tables. ->  http://clrx.nativeboinc.org/wiki/browser/CLRX/CLRadeonExtender/trunk/amdasm/GCNInstructions.cpp

0 Likes

hi, realhet. I described the syntax modes for GCN instructions (on the GCInternals.h). Some modes has been renamed, one has been removed.

My instruction table comes from the driver reverse-engineering and I will be verifying that table on the real devices. GCN3 refguide is sucks, incomplete, buggy documentation (many important lacks, and few errors). Hence, my decision about looking up inside driver to find the correct opcode table. Still, I have doubts whether is correct.

0 Likes

Now I reached the, point: #error "Other platforms than Linux is not unsupported"

haha

0 Likes

. Next stupid grammar bug .  oh my god. I didn't try to port this stuff to the Windows World. I apologize for that. This will be done maybe in the next month/year. Thanks.

Lol, I've just noticed that double negation in the errormessage and I ROFL

0 Likes

What do you mean by 'driver reverse engineering'? Have you traced it with a debugger or something? o.O That would be insane. Or are you using an opensource Linux driver for reference? I saw you have complete info on the amd-ELF format as well, I guess it comes from that opensource Linux driver, right?

Now I have your disasm working on a helloWorld.cl, it was perfectly disassembled both on GCN12 & GCN3.

However I give it some other non ocl code, and id failed at the first instruction: s_getpc_b64 s[20:21]

I have found a minor bug too: Disassembler::getDeviceType() fails when you specify raw code.

Anyways, it is great work, keep it up!

0 Likes

Ouch: I was misleading. As now i've found the problem, the error can be triggered by this:

   s_branch label

   ...

   label:

So branching forward causes an access violation inside ISADisassembler::writeLocation()/binaryMapFind().

I just bypassed it and then the whole 20KB binary disassembled without errors.

Now I can start to get familiar with gcn3 changes...

Thanks again! Great work

0 Likes

Thank you for bug reporting. About reverse-engineering. I am using debugger to trace CodeXLAnalyzer.

I am not disassemblying any code (only tracing) of that utility and AMD driver (just it is illegal).

Just, I put asm code after program building and I run code and I am checking output binary. I wrote simple script (for gdb) to doing this thing.

Can you explain how do you invoke my disassembler code? Are you running that under linux?

PS: I am debugging AMD Catalyst driver.

0 Likes

TBH I've never tried CodeXL (using Qt, not VS), but it is impressive that how much information it contains, hehe.

This is valuable info what you have on the ELF format. But I still stick to patching driver generated kernels because parameter passing changes like in every year, so let the driver do it.

How disassemble:

I downloaded your CLRX project and then used and modified a minimal set of files from it in order to run the raw disassembler under MSVC2013 win32. Compiled it to a compact .exe and calling it from my IDE when needed.

There were a lot of platform specific things to modify, but the most important files (the disassembler itself) are kinda unmodified.

Since 2-3 years this is the first time I see an actual disasm of the code produced by my assembler, haha.

Now I'm at making a hello world kernel with my assembler and pass it only to your disassembler and check it.

The new AMD GCN3 compiler uses untyped buffers, so I gotta do those as well, and hope that they will be compatible on GCN1 too.

0 Likes

I forgot one: below  the correct routine to call disassembler (for rawcode):

Flags disasmFlags = DISASM_DUMPCODE | DISASM_HEXCODE;

Disassembler disasm(gpuDeviceType, codeSize, code, std::cout, disasmFlags);

disasm.disassemble();

Semmingly, you omit very important call gcnDIsasm.beforeDisassemble which find and set up labels. Routine above listed is doing everything what is needed.

You can try to compile clrxdisasm tool that is full disassembler.

0 Likes

void Disassembler::disassembleRawCode() calls beforeDisassemble,

But omg... I have 2200 instructions (in dwords) that is every combination of encoding and opcode saved from the old amd disassembler, and now I have to mix it seamlessly with the new gcn3 ones... Gotta think how to do it without shooting myself in the leg

Hello Asm Builder Group!  more conversation for you....

"But then how can you switch between the 2 versions automatically based on the actual hardware?"

I'm back and forth on this. If I don't keep the GCN12 format then any new versions will only support GCN3 going forward. This will make the OpenCL patching much less problematic and the GCN asm easier/reliable/efficient/advanced as well. I would not need to emulating anything either (even though I think that is really cool )  or using the new instructions on older hardware. The major drawback of course is that any programs will only work with the new version. Users will not really be able to use their code on one machine with GCN3 and then use the same code on GCN12 gpu. They would need to use the old version for that.

I'll try to mimic the new GCN3 things on GCN12. For example on bot there will be byte_permute, but if the target is GCN12, I'll emulate it with larger amount of produced machine code. Really cool - with your variable support now I think it should make that easier. Just thinking out load:

  • simple macros can maybe be used if it detects the older version.
  • the smem would be complicated
  • GCN12 kernels might need to use more registers then non-GCN3 kernels. (since they will have more macros)
  • I guess there would need to be lots of emulation maybe. Just VOP3A  has 212 new instructions.
  • Lots of work (maybe)

Thanks for @matszpk, we can use his instruction tables 

Thank you @matszpk for making this available. This will help greatly. I agree the the GCN guide is defiantly buggy​. I kind of noticed an error in the new guide. Many of the VOP3a diagrams(12-65) seem to point to a 4-bit NEG but the description points(probably more accurate) points to a 3-bit NEG(13-35). I'm sure its a difficult task though to get it all correct. I'm definitely glad AMD publishes this!!!  NVidia does not.  That is neat that you can suck that information from the drivers.  I wonder how you do that.

Correction on my "4-bit NEG" comment - I think it is still 3 bits still actually.

0 Likes

realhet:

I'll try to mimic the new GCN3 things on GCN12. For example on bot there will be byte_permute, but if the target is GCN12, I'll emulate it with larger amount of produced machine code.

Chiming in.

Isn't GCN 3 supposed to be GCN1.2? Was that supposed to be "new GCN3 things on GCN11"?

0 Likes

As I know, there are some new instructions (byte permute is a very good example).

But also new options for input operands:

- In GCN12 there was movRel to index a VReg with an SReg. On GCN3 you can encode this indexing operation into an alu instruction, so you don't waste a cycle for SReg indexing.

- The other new thing is when get an input from a register of a neighboring thread. I think similar rules apply to this like the ds_swizzle instruction. And I can bet the driver developers are using these goodies for opengl/direct3d graphics. And I guess it can be also accessible from HSA as it has swizzle functionality.

These new encodings broke the compatibility anyway, so technically it doens't matter that the opcodes have been changed as well.

0 Likes

Interesting... I thought 1.2 and Gen3 were different also but it appears they are the same.

Graphics Core Next - Wikipedia :

  • GCN 1st Generation "GCN 1.0" (Southern Islands, HD 7000/Rx 200 Series)
  • GCN 2nd Generation "GCN 1.1" (Sea Islands, HD 7790 and Rx 290/260 Series)
  • GCN 3rd Generation "GCN 1.2" (Volcanic Islands, R9 285)

AMD's CodeXL compiler options:

  • Graphics IP v6 - Cape Verde, Hainan, Oland, Pitcairn, Tahiti
  • Graphics IP v7 - Bonaire, Havaii, Kalindi, Mullins, Spectre, Spooky
  • Graphics IP v8 - Carrizo, Fiji, Iceland, Tonga

ISA Manuals:

0 Likes
realhet
Miniboss

Now that I've spent 25 hours on studying GCN3 isa and making my compiler able to work well with the first 60 instructions, I'd say it is not a problem that it isn't compatible at all.

For example VOP3a and VOPb have been rearranged in a way that the clamp bit can be at the same position in both encodings. So after I understand it, these changes really made sense.

And I'm not even at DPP or SDWA. I can't even think what can I do with them, but I must learn them to be able to use them in the future. I'm only at the point, that my compiler can make code GCN1 and GCN3 targets from the same source assembly. I did only one change so far: The offset of s_buffer_* became a byte offset instead of a dword offset. In order to preserve compatibility, my compiler will ask for an ofsByte or an ofsDWord option, or else it will bring up an error.

Matszpk's disassembler helped me a lot I made tons of silly errors in the instruction encoder, but the disasm told me if I did a mistake. I've imported 60 instructions so far from GCNInstructions.cpp and they were all correct! My plan is that I gonna automatically import all the GCN3 instructions from Matszpk's cpp, but mark them unsafe. Later when I use them in a project and I proved that they do what they intended to do, I can clear the unsafe flag.

I have a question to Matszpk too: Do you allow me to put your disassembler into my IDE next time, when I release it? It will be in a separate .exe command line tool, and my compiler will call it whenever it needs to disasm. Also when I use the OpenCL compiler, then the 2 disasms can be used to compare your instruction tables with the official codes.

I see you working hard on CLRX btw. I wonder what plans you have? Is it ready to write a coinminer in it right now? Are you planning to go and make high level language features? I have a cool macro engine now but high level language support is starting to be a more important thing. Type handling, and expressions can make life much easier, especially if we have thousands of asm instructions at hand ...

0 Likes

Are you going from mnemonics to ISA directly? Why not going through an intermediate step using Abstract Syntax Trees or something?

I have a "compiler thing" I used in the past to mangle odd sorts of shaders, between the various options there should be an "output AST in JSON format". It has support for class, some protection level, overrides, single inheritance and full multi-interface only inheritance.

As a last note: if we can agree on some common interface, I might* be willing to integrate it in my miner as well as port it to some *nix, assuming that's your interest. I can also provide some guidance in navigating the legacy miner (sgminer and such) code.

*: I might, as I cannot commit any effort as of now consider this purely a declaration of intent.

0 Likes

Yes. I confirm. Some parts ot that is pure design, but it might provide good performance. That project was designed to be independent as possible (requires only std libraries) and to be a clean assembler without sugar-syntax and other extras (just clean assembler).

0 Likes

This is it right now:

An asm compiler with some extra things: aliases, labels, register allocations/releases in blocks.

On top of this thre are macroes: multiple parameters, same name. define, assign, macro/endm, ifdef, and a 'for' loop.

And finally the highest level of abstraction where I can generate inlined code with using script language.

Now I want to make an option in the assembler to be able to inline high level script functions. My script has some cool stuff btw: parallel operations on arrays and simple expression solving for example. So I'd really like to use these goodied to generate safe asm code.

Type handling can't be done with macroes, and also a=b+c is much more elegant than ADD(a,b,c).

Step1: Make a small script to do XYZcoin algo.
Step2: Mimic the header of that specific ocl kernel.

Step3: inline the script function into the asm kernel

It would be so easy

And most importantly: later on it would be possible to mark dirty/constant data, so automatic optimizations could be made. Manually I'd not.

0 Likes

I wrote something wrong about my project. Ofcourse, that going to be a clean assembler but with GNU as syntax and its pseudo-ops (directives). Macros, expressions, symbols and repetitions (.rept, .irp) in pseudo-op level are done in my project (and tested). Nevertheless this is not a compiler (just is assembler). Almost features from GNU as are implemented in my program.

This assembler will not be doing following things:

optimize code, make function inlines, treat expressions as statements (likes as high level languages) and other HL features.

Only pseudo-ops, symbols, kernel and program definition, and ISA instructions.

0 Likes

thnks realhet. I write about goals of that project: That project going to be a clean assembler which provides support for the Amd Catalyst driver and GalliumCompute platform. An assembler is compatible with GNU as syntax together with macro and repetitions support. That all stuff going to be have a standalone binary generator (any properiaty stuff will not be needed) for Catalyst OpenCL and GalliumCompute platform.

0 Likes
matszpk
Adept III

Hi, realhet and other guys!

CLRadeonExtender 0.1 is available! Just download from ClrxDownloads – CLRadeonExtender

Now is available documentation (ofcourse in ugly state): ClrxToc – CLRadeonExtender

Currently, only Linux is supported. Now, There are two tools:

clrxasm - a complete GCN assembler with rich features (macros, symbols, repetitions, support for many kernels).

clrxdisasm - a complete GCN disassembler.