cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

relpats_eht
Journeyman III

Raw ISA bytecode location

Hello,

I've been doing some work with the output of the OpenCL compiler for a project and am currently looking for the raw ISA bytecode output within the resulting binary. I've deduced the output of the OpenCL compiler looks something like this (with most sections omitted for brevity):

OpenCL Elf

  • .amdil: Text IL
  • .rodata: text runtime information
  • .text: AMD runtime elf
    • .text: IL bytecode
    • .text: Data containing ISA bytecode

The trouble is, I don't trust that second .text section in the AMD runtime elf as being strictly ISA bytecode. It generally looks like three different data sections, separated by chunks of zeros (in the sample programs I am compiling). I was wondering if someone could give me a bit of insight into the structure of this section. The r600 ISA spec is public, so I could try to decompile it by hand to see where it starts making sense, but I'd like to avoid the tedium if at all possible.

Thank you.

0 Likes
1 Solution
46 Replies
realhet
Miniboss

I'm 100% sure that the ".text: AMD runtime elf" is the actual CAL elf image that runs. You can recompile it yourself using the ".amdil: Text IL" section and calClCompile().

(Just remove the TextIL (opencl, and llvm) section from the OpenCl elf if you want to run the ".text: AMD runtime elf")

0 Likes

I'm sorry, you misunderstand. The output of the compiler is an elf file. The first .text section of that elf file is a nested elf file (in a slightly custom AMD specific format). The second .text section of that nested elf file is what I am talking about here. I am almost positive that section contains the raw ISA bytecode, but I am also almost positive it contains additional binary data. I want just the raw ISA bytecode.

Is that more clear?

Thank you.

0 Likes

I know it's an elf inside an elf.

"I want just the raw ISA bytecode."

Then simply get that section you've already mentioned. (the second .text section in that nested .elf)

Additional binary data is:

- ATI option data (like constant buffer sizes, LDS size, Wavefrontsize, and many more parameters, it's in an ELF compatible format)

- around 1 kbytes of zeroes before the ISA code.

- then comes the actual microcode. (on VLIW it has two parts separated with 512(?)bytes zero padding: program flow - and the actual alu code)

The above is all marked in the elf's program_header. (program header in the .elf inside the opencl .elf)

0 Likes

Thank you realhet. I may be misunderstanding you, but I believe our information is conflicting. For reference, I am compiling for RV730.

Here is my complete analysis of the internal ELF file:

  • nameless section: 0 bytes
  • .shstrtab
  • Unlisted section.
    • Miscellaneous unknown data
    • Contains many "ATI CAL" lines (see below)
  • .text
    • IL bytecode
  • .data
    • 4736 bytes of 0's
  • .symtab
  • .strtab
  • Unlisted section.
    • Many lines of "ATI CAL"
      • Has the string "ATI CAL" followed by a few bytes of unknown data
    • followed by register settings
    • followed by unknown key value pairs in the format 0x8000000X value
      • Value is usually 0, except for 6 it is 20 and A it is 1.
    • More "ATI CAL" lines
    • More 256 bytes of 0's
  • .text
    • ISA bytecode?
    • 64 bytes of data
    • 192 bytes of 0
    • 308 bytes of data
    • 92 bytes of 0
    • 32 bytes of data
  • .data
    • 4736 bytes of 0
  • .symtab
  • .strtab

Obviously, this analysis has some holes in it, and knowing everything specifically would be ideal. Now, if I'm understanding you correctly, the second .text section of the internal elf is the ISA bytecode, but is broken up into two parts (by the 192 bytes of 0 padding). Is that correct? Is there perhaps some structure I could see which describes this data?

Thank you.

0 Likes

relpats_eht,

The easiest way to see the structure is to disassemble the text and see what the various sections are. You can use calclDisassemble for the calBinary or -save-temps when compilng to the binary with the OpenCL api.

0 Likes
drallan
Challenger

Yes, the second text section in the inner elf file contains the ISA binary code (instructions), word for word, and only the ISAcode. What might be confusing is that the inner elf has many other sections that point to various places, a second indexing. Some of these are "program" sections prg0, prg1, etc. But what you want is the second text section. The following is cryptic but might help as you have already gotten most of the way. It is the output dump from an assembler (144 bytes) showing the elf unpack and repack process. It is a bit long but probably clearer than trying to describe the process in words.

Other sections contain start up information used by the program like memory address's,  memory descriptors, and constant data. The CAL CAL CAL part contains setup info such as pointers, code lengths, number of registers used, etc.

I have seen bits and pieces of documentation  which leads me to believe some of this information may be available from AMD, possibly on request?

It would be nice to know where to find/how to get such info.

//---------------------------------------------------------------------------------

RUN OPENCL

ASSEMBLER OUTPUT:

asm:source 1721 bytes src

source errors: 0

code length:   144 bytes [36]      <---------- (GENERATED CODE)

first address: 00000000

last address:  0000008C

optim. insn:   0

savebino: 1 bins, size=[15996, 0, 0]

//---------------------------------------------------------------------------------

DUMP OF ELF UNPACK STARTS HERE

//---------------------------------------------------------------------------------

wr_bin: bin1 -> bin/isatest.bn0 size=15996

Elf:filesize =    3E7C (15996)

Elf:sh table @ 15716,      7 sections

Elf:ph table @     0,      0 programs

read nametab at 52 size =50

sh 0 type= 0 link(0) start      0 size      0 []

sh 1 type= 3 link(0) start     52 size     50 [.shstrtab]

sh 2 type= 3 link(0) start    102 size     66 [.strtab]

sh 3 type= 2 link(2) start    168 size     64 [.symtab]

sh 4 type= 1 link(0) start    232 size    391 [.rodata]

text(0) index = 5

sh 5 type= 1 link(0) start    623 size  15026 [.text]

sh 6 type= 1 link(0) start  15649 size     67 [.comment]

----------------------------------------------

enter the inner elf

----------------------------------------------

Elf:filesize =    3AB2 (15026)

Elf:sh table @   292,     10 sections

Elf:ph table @    52,      5 programs

ph 0 type=70000002 start    212 size     40 memsize      0

ph 1 type=00000004 start    692 size    532 memsize      0

ph 2 type=00000001 start   1224 size   7429 memsize   7429

ph 3 type=00000004 start   8653 size   1444 memsize      0

ph 4 type=00000001 start  10097 size   4929 memsize   4929

read nametab at 252 size =40

sh 0 type= 0 link(0) start      0 size      0 []

sh 1 type= 3 link(0) start    252 size     40 [.shstrtab]

text(0) index = 2

sh 2 type= 1 link(0) start   1224 size   2592 [.text]

sh 3 type= 1 link(0) start   3816 size   4736 [.data]

sh 4 type= 2 link(5) start   8552 size     80 [.symtab]

sh 5 type= 3 link(0) start   8632 size     21 [.strtab]

text(1) index = 6

sh 6 type= 1 link(0) start  10097 size     92 [.text]

sh 7 type= 1 link(0) start  10189 size   4736 [.data]

sh 8 type= 2 link(9) start  14925 size     80 [.symtab]

sh 9 type= 3 link(0) start  15005 size     21 [.strtab]

//---------------------------------------------------------------------------------

enter and work on inner elf

//---------------------------------------------------------------------------------

pass2 2 text sections

p2text0: offset=1224 size=2592

p2text1: offset=10097 size=92

FIX CODE section 6 .text[1]   <-----   (note: this is the second text section (of 0,1), inner elf)

codesize 92 -> 144 bytes              (original 92 byte section replaced with 144 byte new program)

FIX CODE(1) end

SEC(10) move phtab by 52

SEC(10) move shtab by 52               (start moving things around)

EXPAND program 4 by 52

EXPAND section 6 by 52

MOVE section 7 by 52

MOVE section 8 by 52

MOVE section 9 by 52

(WOW)p34sum 6373 --> 6425

wr pgr0  start    212[   212] size     40[    40]

wr pgr1  start    692[   692] size    532[   532]

wr pgr2  start   1224[  1224] size   7429[  7429]

user elements: 3

uav memr buffer: s[4,7] type 10

constant buffer: s[8,11] type 0

argument buffer: s[12,15] type 1

vgpr registers: 4 -> 10

sgpr registers: 19 -> 25

lds aux mvalue: 001009A0 -> 00000000

gds max memory: 8000 -> 0

lds max memory: 2000 -> 8000

------------------------------------------

  work done, now start reassembly, report what goes where

------------------------------------------

wr pgr3  start   8653[  8653] size   1444[  1444]

wr pgr4  start  10097[ 10097] size   4981[  4929]

wr sec0  start      0[     0] size      0[     0]

wr sec1  start    252[   252] size     40[    40]

wr sec2  start   1224[  1224] size   2592[  2592]

wr sec3  start   3816[  3816] size   4736[  4736]

symtab(4) at 8552

=strtab(5) size     21

symbol(0) ndx= 0 size=    0 info=0 val=    0 []

symbol(1) ndx=16 size=    0 info=0 val=    0 [uav9]

symbol(2) ndx=16 size=    0 info=0 val=    1 [uav10]

symbol(3) ndx=10 size=    0 info=0 val=    0 [cb1]

symbol(4) ndx=10 size=    0 info=0 val=    1 [cb0]

wr sec4  start   8552[  8552] size     80[    80]

wr sec5  start   8632[  8632] size     21[    21]

save data section(6)

wr sec6  start  10097[ 10097] size    144[    92]

wr sec7  start  10241[ 10189] size   4736[  4736]

symtab(8) at 14925

=strtab(9) size     21

symbol(0) ndx= 0 size=    0 info=0 val=    0 []

symbol(1) ndx=16 size=    0 info=0 val=    0 [uav9]

symbol(2) ndx=16 size=    0 info=0 val=    1 [uav10]

symbol(3) ndx=10 size=    0 info=0 val=    0 [cb1]

symbol(4) ndx=10 size=    0 info=0 val=    1 [cb0]

wr sec8  start  14977[ 14925] size     80[    80]

wr sec9  start  15057[ 15005] size     21[    21]

wr pent 0 at    212 offset      0                             (pent means elf  "program entry")

wr pent 1 at    692 offset      0                           

wr pent 2 at   1224 offset      0

wr pent 3 at   8653 offset      0

wr pent 4 at  10097 offset      0

wr sent 0 at      0 offset      0                            (sent means elf section entry)

wr sent 1 at    252 offset      0

wr sent 2 at   1224 offset      0

wr sent 3 at   3816 offset      0

wr sent 4 at   8552 offset      0

wr sent 5 at   8632 offset      0

wr sent 6 at  10097 offset      0

wr sent 7 at  10241 offset      0

wr sent 8 at  14977 offset      0

wr sent 9 at  15057 offset      0

----------------------------------------------

inner elf now repacked, need to resize outer elf

----------------------------------------------

p1text1: newsize=15026 oldsize=15078

SEC(7) move phtab by 52

SEC(7) move shtab by 52

EXPAND section 5 by 52

MOVE section 6 by 52

wr sec0  start      0[     0] size      0[     0]

wr sec1  start     52[    52] size     50[    50]

wr sec2  start    102[   102] size     66[    66]

symtab(3) at 168

=strtab(2) size     66

symbol(0) ndx= 0 size=    0 info=0 val=    0 []

symbol(1) ndx= 4 size=  359 info=1 val=    0 [__OpenCL_test_metadata]

symbol(2) ndx= 5 size=15026 info=2 val=    0 [__OpenCL_test_kernel]

found _kernel  symbol, section 5 size 15026 --> 15078

symbol(3) ndx= 4 size=   32 info=1 val=  359 [__OpenCL_test_header]

wr sec3  start    168[   168] size     64[    64]

rodata:lds memory: 2000 -> 8000

wr sec4  start    232[   232] size    391[   391]

save data section(5)

wr sec5  start    623[   623] size  15078[ 15026]

wr sec6  start  15701[ 15649] size     67[    67]

wr sent 0 at      0 offset      0

wr sent 1 at     52 offset      0

wr sent 2 at    102 offset      0

wr sent 3 at    168 offset      0

wr sent 4 at    232 offset      0

wr sent 5 at    623 offset      0

wr sent 6 at  15701 offset      0

==========================================================

/

Thanks for the additional information. What tools did you use to generate that output? It still seems rather odd to me for bytecode to have so many 0's in it. I should probably review the ISA spec some to see why.

0 Likes

The output is from an ISA binary re-loader I wrote, the debug information helped me figure out the file structure as I wrote it.

If you are using a public elf tool, it might not be so easy because the inner elf is a little non standard, but it should still work.

The code should not contain 0s, below, p2text1 is the second of the 2 text sections in the inner elf. It has only 92 bytes and

no zero (words) Every 32 bit entry is a hardware instruction. If you see a large pile of 0s, maybe you have not

at to the second text section? The first text section (inner elf) usually has a lot of 0 padding.

pass2 2 text sections

p2text0: offset=1224 size=2592

p2text1: offset=10097 size=92

FIX CODE section 6 .text[1]   <-----   (note: this is the second text section (of 0,1), inner elf)

codesize 92 -> 144 bytes               (original 92 byte section replaced with 144 byte new program)


0 Likes

No, it's definitely the second text section. Though, for me, the first text section is solid IL bytecode. No zeros, the closest thing to padding is the comments.

0 Likes

VLIW machine code usually DOES contain zeroes -> those are the padding between the program-flow and the alu instruction streams.

Here's an example: (section 6) http://x.pgy.hu/~worm/het/vliw_isa.JPG

(in contrast to this, GCN machine code doesn't contain any padding zeroes because it has only a single continuous instruction stream.)

And then: sec7 is that mistique 4736 bytes of zeroes

sec8 is the symbol table

sec9 is the string table for the symtab

Finally the  4th (0based) entry in the program header encapsulates sec6,sec7,sec8 and sec9, that's what is loaded up to the driver.

Also prog3 contains the ATI CAL key/value pairs. Those are a bit complex, first you have to understand the format of them: It's called "Note section" in the ELF format docs.

Thank you realhet. I've not heard the acronyms VLIW and GCN before, what do these mean?

Were I to try to load this code directly to the card, would I need to remove the padding or modify the code in any way?

Thank you.

0 Likes

0 Likes

Thank you again. I didn't know that documentation existed.

0 Likes

My research happens to require reloading of ISA microcode into a opencl ELF. The information provided by drallan's reply , with outputs of his reloader, is very useful to me. In particular, I need to know if a new binary (the ISA code in the second .text section of the nested ELF) is injected into an existing OpenCL ELF, what are other parts of the ELF file that need to be changed, besides the length adjustments of various sections. For example, the output of drallan's reloader include the following lines:

(WOW)p34sum 6373 --> 6425

wr pgr0  start    212[   212] size     40[    40]

wr pgr1  start    692[   692] size 532[   532]

wr pgr2  start   1224[  1224] size   7429[ 7429]

user elements: 3

uav memr buffer: s[4,7] type 10

constant buffer: s[8,11] type 0

argument buffer: s[12,15] type 1

vgpr registers: 4 -> 10

sgpr registers: 19 -> 25

lds aux mvalue: 001009A0 -> 00000000

gds max memory: 8000 -> 0

lds max memory: 2000 -> 8000

This fragment seems to suggest that some kernel program information, e.g., "uav memr buffer" and "vgpr registers", presumably stored in the NOTE segment, must be modified. The only documentation I can find is AMD's CAL Programming Guide, which gives the structure of the records in the NOTE segment but it does not explain how these program information are stored. Could AMD support staff or drallan shed some light on this?

Thank you.

0 Likes

Can you please give me a link for that  CAL Programming Guide? I can't find it on developer.amd.com o.0

Things in the note section I know about:

- user elements: initial configuration for resource constants.  In the example there are 2 resources: uav, constbuf, argument buf. For example: In the ISA code you can access the UAV with the registers s[4:7]. Resource Constants are similar to memory range descriptors in an x86 system. (More info in -> GCN ISA manual)

You basically don't have to modify these, just use it in the form as OpenCL compiled it. The only exception is the size information at a constant buffer declaration.

"which gives the structure of the records in the NOTE segment but it does not explain how these program information are stored." You mean, how the standard ELF note section is stored? -> Here it is http://www.skyfree.org/linux/references/ELF_Format.pdf  Search for "Note Section" in it.

Is there a note key list in that "CAL Programming Guide" you mentioned? I have some keys that I had hacked out of the cal image, but if there's a complete list, I'd be glad to see it.

Hi realhet,

I could not remember where I found the document. If you let me know how I can send you a copy, I would be happy to do so.

Thank you for your help. Sorry I did not explain very clearly what I need in my first message. I have read the ELF format document, the GCN architecture ppt, and the SI-GPU ISA document. The ELF format pdf specifies the general ELF format. The Appendix B of the CAL Programming Guide specifies how the ELF format is used in the CAL context, that includes, for examples, why there are two NOTE and LOAD segments (thus two .text sections) and what information is stored in the NOTE segment. However, it does not provide any information about where the kernel program info (like # of vgprs, sgprs, lds memory, etc.) are stored. It might be explained in drallan's attachment -- I will verify this in my program.

0 Likes

Hi realhet,

To answer the question in your message: Yes, the document includes the definition of the CALNoteHeader struct,

typedef struct {

Elf32_Word namesz; /* size of the name field. Must be 8 */

Elf32_Word descsz; /* size of the data payload */

Elf32_Word type; /* type of the payload */

char name[8]; /* note header string. Must be “ATI CAL” */

} CALNoteHeader;

and Type Identifiers:

Note Type Description

Inputs ELF_NOTE_ATI_INPUTS  Inputs used by the program.

Outputs ELF_NOTE_ATI_OUTPUTS  Outputs written by the program.

FloatConsts ELF_NOTE_ATI_FLOATCONSTS  Float constant data segment desc

IntConsts ELF_NOTE_ATI_INTCONSTS  Integer constant data segment desc

BoolConsts ELF_NOTE_ATI_BOOLCONSTS  Boolean constant data segment d

EarlyExit ELF_NOTE_ATI_EARLYEXIT  Program termination description.

GlobalBuffers ELF_NOTE_ATI_GLOBAL_BUFFERS  Global import/export buffer.

ConstantBuffers ELF_NOTE_ATI_CONSTANT_BUFFERS  Constant buffer usage mask.

SamplerMap ELF_NOTE_ATI_INPUT_SAMPLER_MAP  Input to sampler binding table.

ScratchBuffer ELF_NOTE_ATI_SCRATCH_BUFFER  Scratch memory usage mask.

PersistentBuffer ELF_NOTE_ATI_PERSISTANT_BUFFER  Persistent memory usage mask.

ProgramInfo ELF_NOTE_ATI_PROGRAM_INFO  Device configuration table.

0 Likes

Hi jg1,

Note that the inner elf has about 5 'program' sections 3 of which overlap some of the normal text sections,

this is helpful when calculating the new section offsets and sizes. The program sections have their own

program headers instead of the regular section headers.

1. Replace the gcn binary code in Inner file: second text section.

2. Calculate and change all required sizes in section headers and program headers.

3. Change a few parameters in the inner file 4th (index 3)  'program' section.

  (this includes things like the no. of SGPRs, VGPRs, lds  and gds memory requirements)

After packing the inner elf

4. Change appropriate section sizes in outer elf section headers.

5. In the outer elf ".rodata" section change the lds memory maximum if needed (see attached notes/code snippets)

6. In the outer symbol table entries, change the size of the section with the kernel name (need to look at strtab for the name,

again there is a code snippet in the attached notes.)

Then, pack that thing!

There is more info in the notes that I couldn't paste here because this editor keeps inserting everything into tables.

The notes also mention how to change the parameters in the 4th program section in the inner file. Basically you look

for a 32 bit key code first and then read/write the data in the following int. There is a table of important key codes in

the notes.

Note, some of the data output from my program does not need to be changed, i.e., the uav memr buffer register assignments and type.

These are just for reference and/or can be used for programming at a higher level, like a compiler.

Good luck, it's really not that difficult, only made a little complicated by the elf format.

Sorry for the jumbled notes.

drallan

Hi drallan,

Thank you for your reply. It gives quite a lot of information. I think the linkdat struct is what I am looking for. I will do some tests and let you know the results. Thanks a lot.

P.S. By the way, where in AMD's documents can I find information of the linkdat struct?

0 Likes

jg1 wrote:

Hi drallan,

Thank you for your reply. It gives quite a lot of information. I think the linkdat struct is what I am looking for. I will do some tests and let you know the results. Thanks a lot.

P.S. By the way, where in AMD's documents can I find information of the linkdat struct?

Hi jg1,

The linkdat structure is from my elf repacker and is a partial list of entries that I have found useful. I don't know of any AMD document that lists their meaning and values. Some I have dug out and I am looking to see if I have some additional info, I think I must.

The 5 elf program sections that I mentioned are what the CAL programming manual refers to as the PT_NOTE segment. The CAL programming guide describes the basic structure of the inner elf but omits some useful information. The 2 'program' sections that overlap the elf sections contain the executable and load data. Usually you only need to fix the executable section.

The other 3 program sections (0, 1, 3), contain information about the program and set up. section 3 is the one with the important information.

One of the CAL entries (stuff with the ATI_CAL string) contains a large block of info that contains the information fields found in linkdat. In a sense, they are below, or contained inside, the CAL entry format description in the programmers manual.

To help visualize this, I have attached dumps of the 3 'program' sections, and also the ".rodata" section. Most of the rodata is ascii and is a copy of info you would see from the Kernel analyzer output. There is a small binary section after the ascii part where I change the lds memory allocation. You may not need that.

If I find any more good info I will post it here.

drallan

0 Likes

Hi drallan,

Thank you again for your kind help. I added your inputs to my readelf program and attached the outputs here. Next I am going to do the repacking. I will post my findings here.

The linkdat structure looks like the "device configuration table" mentioned in the CAL Programming Guide. The declaration of the CALProgramInfoEntry is as follows:

typedef struct {

     Elf32_Word address; /* device address */

     Elf32_Word value; /* value */

} CALProgramInfoEntry;

It is probably used by the driver to program the GPU chips through device I/O ports.

jg1

0 Likes

The linkdat structure looks like the "device configuration table" mentioned in the CAL Programming Guide.  [...]

typedef struct {

     Elf32_Word address; /* device address */

     Elf32_Word value; /* value */

} CALProgramInfoEntry;

It is probably used by the driver to program the GPU chips through device I/O ports.

Yes, I think that must be correct. The "device address" also looks like a PCI bus IO transaction.

I checked, the entries in linkdat were all dug from code, I don't have any other docs on the device configuration table values.

drallan

0 Likes

Hi drallan and realhet,

My repacker is now working. Thank you for all your help. I really appreciate it.

For the purpose of repacking ISA code (with the same kernel function signature), drallan's instructions are good enough.

I also find good information in this file. Look for struct si_bin_enc_dict_entry_t, and enum _E_SC_USER_DATA_CLASS. You might want to look at other files at the web site to see how the dots are connected. I modified my readelf program again and its output is in the attachment.

I still have not found information about the use of GDS_MAX, LDS_MAX, and LDS_AUX. But they do not seem to matter with the level of complexity of my current kernel. I will post here if I find anything.

jg1

0 Likes

Wow! Nice find -> multi2sim

http://www.multi2sim.org/svn/multi2sim/trunk/src/arch/southern-islands/emu/machine.c

This seems to be an accurate 'documentation on every GCN instructions

Also there's a complete* inverse assembler -> http://www.multi2sim.org/svn/multi2sim/trunk/src/arch/southern-islands/asm/asm.c

I think don't you  wanna understand all the values in the .elf, only patch that few you really use and leave the rest for the ocl compiler.

GDS_MAX, LDS_MAX, and LDS_AUX: That's new to me too: I only tried LDS on CAL, but there the only thing I had to change was the control word marked with PGM2_COUMPUTE_RESOURCES in the disasm. If I recall bits8:15 (or something like this) represented the LDS allocation size. But seems like OCL stores it redundantly.

"as10.txt.zip" <- Well done!

Edit: Oh noes, multi2sim is full of inumplemented instructions

0 Likes

Hi realhet,

You are right. Multi2sim can be a good reference when we need to find out execution details of an ISA instructions. I have not studied yet but it is good to know where to look to when I need.

Thanks for the info of LDS. They are bits 8:16 in COMPUTE_PGM_RSRC2. So you are only a tiny "bit" off.

0 Likes

jg1 wrote:

Hi drallan and realhet,

My repacker is now working. Thank you for all your help. I really appreciate it.

For the purpose of repacking ISA code (with the same kernel function signature), drallan's instructions are good enough.

I also find good information in this file. Look for struct si_bin_enc_dict_entry_t, and enum _E_SC_USER_DATA_CLASS. You might want to look at other files at the web site to see how the dots are connected. I modified my readelf program again and its output is in the attachment.

I still have not found information about the use of GDS_MAX, LDS_MAX, and LDS_AUX. But they do not seem to matter with the level of complexity of my current kernel. I will post here if I find anything.

That was fast! Congratulations.

I have the multi2sim files. They are very good for gcn code but they don't have the device configuration table info (Where is it!?)

The _E_SC_USER_DATA_CLASS looks real interesting but the numbers don't seem to correlate with anything. Maybe specific to multi2sim?

The Linux Mesa/Gallium project also has some (too many ! ) files but they are a bit less clear.

You can google mesa southern islands.

I would be very interested if you or realhet find any more good info, and happy to share anything I find too...

drallan

0 Likes

Thanks, drallan.

Re: the device configuration table and data class enum:

in asm.h, look for this:

/* Encoding dictionary entry */

struct si_bin_enc_dict_entry_t

{

...

          unsigned int userElementCount;

          struct si_bin_enc_user_element_t userElements[16];

...

};

I think the name "si_binenc_dict_entry_t" is incorrect. For my understanding, this structure is actually the device configuration table. It is one of the Note Headers in the second Note segment. The following lines are taken from my readelf program output (see my last post):

CAL Note Segment:

TypeField                 Type PayloadSize

0x00000002                (TBD)        0x0

0x00000003                (TBD)        0x0

0x00000010                (TBD)       0x40

0x00000004                (TBD)        0x4

0x00000005                (TBD)        0x0

0x00000006                (TBD)        0x0

0x00000007                (TBD)        0x0

0x00000008                (TBD)        0x4

0x00000009                (TBD)        0x0

0x0000000a                (TBD)       0x10

0x0000000b                (TBD)        0x0

0x0000000d                (TBD)        0x4

0x0000000c                (TBD)        0x0

0x00000001                (TBD)      0x390    <----- Device config table

0x0000000e                (TBD)        0x0

0x0000000f                (TBD)        0x4

0x00000011                (TBD)       0x80

The payload of the header is a list of CALProgramInfoEntry structs

typedef struct {

     Elf32_Word address; /* device address */

     Elf32_Word value; /* value */

} CALProgramInfoEntry;

and they look like this:

Device configuration table (Program Information):

TypeField Type                      Value

0x80001000 USER_ELEMENTS               0x3

0x80001001 UE0_DataClass              0x17

0x80001002 UE0_APISlot                 0x0

0x80001003 UE0_SGPRStart               0x2

0x80001004 UE0_SGPRCount               0x2

0x80001005 UE1_DataClass               0x2

0x80001006 UE1_APISlot                 0x0

0x80001007 UE1_SGPRStart               0x4

0x80001008 UE1_SGPRCount               0x4

0x80001009 UE2_DataClass               0x2

0x8000100a UE2_APISlot                 0x1

0x8000100b UE2_SGPRStart               0x8

...

The first item is the user elements count, following by 16 user element structs defined as:

struct si_bin_enc_user_element_t

{

          unsigned int dataClass;

          unsigned int apiSlot;

          unsigned int startUserReg;

          unsigned int userRegCount;

};

The first field is the data class. Its value is defined by _E_SC_USER_DATA_CLASS. In the above example, the first user element, UE0, has a data class value of 0x17, which is a PTR_UAV_TABLE. Likewise, UE1 and UE2 are of IMM_CONST_BUFFER data class.

I will look at Mesa/Gallium, thanks. I am happy to share my findings, too.

0 Likes

Hey,

Do you wanna make your ISA wrapper to be able to define kernel parameters as well?

I think it gonna be really usefull, but where to get all the information for it from

I'm thinking it more simpler: Don't wanna understand how those 16 elements works and also understand HOW OPENCL PASSES BUFFER ptrs. I'll need to use them from the asm side also. And what if AMD changes that structure (the can rewrite it, and generate different amd_il code, you know)?!

I'm thinking about to compile skeleton opencl code and then patch it automatically.

Anyways, now I try to make a 99% OpenCl compatible patcher which works without LDS/GDS. And after I have some question marks about LDS/GDS.

0 Likes

realhet wrote:

Hey,

Do you wanna make your ISA wrapper to be able to define kernel parameters as well?

I think it gonna be really usefull, but where to get all the information for it from

I'm thinking it more simpler: Don't wanna understand how those 16 elements works and also understand HOW OPENCL PASSES BUFFER ptrs. I'll need to use them from the asm side also. And what if AMD changes that structure (the can rewrite it, and generate different amd_il code, you know)?!

I'm thinking about to compile skeleton opencl code and then patch it automatically.

Anyways, now I try to make a 99% OpenCl compatible patcher which works without LDS/GDS. And after I have some question marks about LDS/GDS.

Yes, I am interested in a more complete solution. I do not need it at this moment but knowing the use of various bits gives me more confidence that any hacking/patching will work in different conditions. I will look into it when I have more time.

jg1

0 Likes

Something's not round with gds/lds:

drallan:

{0x80000081,"GDS_MAX  "},       //???

{0x80000082,"LDS_MAX  "},       //also  80001045 see below

{0x80001045,"LDS_AUX  "}        //also  80001045 see below

+ .rodata

multi2sim:

{ "AMU_ABI_LDS_SIZE_AVAIL",          0x80000081 },

{ "AMU_ABI_LDS_SIZE_USED",          0x80000082 },

I say:

key 0x80000081 is always $80000, whatever I do o.o

key Computr_pgm_rsc2 bits[16:24] is the actual LDS size

What's the truth?   On CAL my version works fine, I tested it. Doing some lds size comparison soon.

Other stuff:

There is the 0th program entry in the cal.elf. It's type in $70000002 and inside there are offset/size pairs referencing to the amd_il and isa sections offseted with a misterious offset. Do you know this? I can shift it anyways according to the new isa size. Maybe it's an offset relative to the OpenCl.elf image...

0 Likes

realhet wrote:

Other stuff:

There is the 0th program entry in the cal.elf. It's type in $70000002 and inside there are offset/size pairs referencing to the amd_il and isa sections offseted with a misterious offset. Do you know this? I can shift it anyways according to the new isa size. Maybe it's an offset relative to the OpenCl.elf image...

I can take this question. Here is the how this LOPROC+2 Program header looks (in as10.txt) (the Program header contains what is called Encoding Dictionary in the CAL Programming Guide)

Encoding Dictionary:

      Type     Offset       Size      Flags

0x00000004      0x2b4    0x8e3dd        0x0

0x00000004    0x8e691     0xbc59        0x0

and the Program Headers

Program Headers:

     Type      Offset      VAddr      PAddr    FileSiz     MemSiz

  LOPROC+2       0xd4        0x0        0x0       0x28        0x0

      NOTE      0x2b4        0x0        0x0      0x234        0x0

      LOAD      0x4e8        0x0        0x0    0x8e1a9    0x8e1a9

      NOTE    0x8e691        0x0        0x0      0x5c4        0x0

      LOAD    0x8ec55        0x0        0x0     0xb695     0xb695

So the "mysterious offset" is really the offset of the 1st Note segment. And the Size (0x8e3dd) is the sum of the 1st Note+Load segments, Is this what you are looking for?

jg1


0 Likes

I see now, thanks. (I was dumb enough not to search in the program header lol) Now I can regenerate that part from actual program header information.

Then I installed 12.8 driver (11.12 was a bit old for lds/gds tests hehe)

LDS:

I've find out how  works (mostly like drallan said):

- ocl.elf.rodata -> see last 8 dwords, dword[3] is the LDS size in bytes (also specified in the ascii dump in the form "hwlocal:n" but let's hope opencl runtime doesn't rely on that)

- cal.elf notes key $00002e13 bits[15:23] (not [16:24] as in multi2sim) specified in 256byte units

- cal.elf notes key $80000082 in 1byte units

GDS:

I've tried it using CAL: I allocated 256bytes of gds, and also used it for an atomic addition. Then I changed the size to 512 bytes, and the only thing changed in cal.elf was that byte 256->512 inside the amd_il code.

So I think we can use GDS under OpenCL as far as no two kernels uses the same parts of the whole GDS.

Unimportant parts:

- amd_il .text section

- both .data sections filled with 4KB zeroes

Although the ascii part of .rodata is important for the disassembler and also for the opencl runtime. I guess the amd_il note part is also unimportant, but I was lazy to write code to replace a program entry.

The ocl elf size reduced from 13.5KB to 3.5KB, and still works.

Attached a 'pseudo' code on how to do the patching:

0 Likes

Good job on the LDS/GDS!

Although the ascii part of .rodata is important for the disassembler and also for the opencl runtime. I guess the amd_il note part is also unimportant, but I was lazy to write code to replace a program entry.

The ocl elf size reduced from 13.5KB to 3.5KB, and still works.

I believe that the 1st (amdil) Note segment is not used (in the CAL image, or nested ELF), except there is no ISA code present, in which case, the driver will have to compile the IL into ISA.

jg1

0 Likes

hi realhet,

Just curious what script language are you using in your attachment?

One question:

SetCalNote($00002e13,(opts.ldsSizeBytes+255)shr 8 shl 15,$FFF07FFF{and mask});

maybe it is a typo of {or mask}?

I verified this with my kernel. the LDS size is indeed in bits [15:23].

jg1

0 Likes

{ } in my code is just a pascal comment. Setnote(key,incr,mask=0) just searches the key and apply the and mask and finally incrementing the value.

0 Likes

LDS:

I've find out how  works (mostly like drallan said):

- ocl.elf.rodata -> see last 8 dwords, dword[3] is the LDS size in bytes (also specified in the ascii dump in the form "hwlocal:n" but let's hope opencl runtime doesn't rely on that)

I ran some experiments to get to the bottom of the confusing rodata question. Surprisingly, its the master setting for the kernel's lds memory. It always works and will override settings in compute_pgm_rsrc2. When the rodata setting is  0, compute_pgm_rsrc2 is used. Compute_pgm_rsrc2 may not work when an ocl skeleton kernel uses lds memory because this sets rodata. If both rodata and compute_pgm_rsrc2 settings are used but have different values, rodata defines the amount of lds memory.

I compiled a small ocl kernel and replaced it with a gcn isa program. The isa program reads and writes a series of 256 byte blocks of lds. The ocl kernel was compiled with and without any lds memory then the lds memory settings were changed in different locations before running the isa code. The isa program shows which parts of the lds memory were active.

Output during setup.

             +--------------------lds setting from ocl kernel

             |      +-------------lds setting at repack

             |      |       +-----location of setting

lds memory: 100 -> 100 (compute_pgm_rsrc2)

Results: OCL kernel defines some lds memory   (note 77777777 means LDS is active)

lds memory: 100 -> 100 (compute_pgm_rsrc2)

lds memory: 100 -> 400 (abi_lds_size_used)

lds memory: 100 -> 100 (.rodata kernel header)

77777777 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ( kernel output)

0x000       0x100      0x200       0x300      0x400     0x500      0x600      0x700  (lds address)

lds memory: 100 -> 400 (compute_pgm_rsrc2)

lds memory: 100 -> 100 (abi_lds_size_used)

lds memory: 100 -> 100 (.rodata kernel header)

77777777 00000000 00000000 00000000 00000000 00000000 00000000 00000000

lds memory: 100 -> 100 (compute_pgm_rsrc2)

lds memory: 100 -> 100 (abi_lds_size_used)

lds memory: 100 -> 400 (.rodata kernel header)

77777777 77777777 77777777 77777777 00000000 00000000 00000000 00000000

OCL kernel defines no lds memory

lds memory: 0 -> 400 (compute_pgm_rsrc2)

lds memory: 0 -> 0 (abi_lds_size_used)

lds memory: 0 -> 0 (.rodata kernel header)

77777777 77777777 77777777 77777777 00000000 00000000 00000000 00000000

lds memory: 0 -> 0 (compute_pgm_rsrc2)

lds memory: 0 -> 400 (abi_lds_size_used)

lds memory: 0 -> 0 (.rodata kernel header)

00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

lds memory: 0 -> 0 (compute_pgm_rsrc2)

lds memory: 0 -> 0 (abi_lds_size_used)

lds memory: 0 -> 400 (.rodata kernel header)

77777777 77777777 77777777 77777777 00000000 00000000 00000000 00000000

lds memory: 100 -> 400 (compute_pgm_rsrc2) rodata overrides compute_pgm_rsrc2

lds memory: 100 -> 100 (abi_lds_size_used)

lds memory: 100 -> 200 (.rodata kernel header)

77777777 77777777 00000000 00000000 00000000 00000000 00000000 00000000

0 Likes

Useful info, thanks!

Today I'm going to fiddle with different kernel function headers, maybe it's not that hard and and then my program can make that 'element table' manually.

0 Likes

Something's not round with gds/lds:

drallan:

{0x80000081,"GDS_MAX  "},       //???

{0x80000082,"LDS_MAX  "},       //also  80001045 see below

{0x80001045,"LDS_AUX  "}        //also  80001045 see below

+ .rodata

multi2sim:

{ "AMU_ABI_LDS_SIZE_AVAIL",          0x80000081 },

{ "AMU_ABI_LDS_SIZE_USED",          0x80000082 },

multi2sim is more round!

0x80000082 LDS_SIZE_USED is the same as LDS_MAX (max used by a workgroup)

0x80000081 is LDS_SIZE_AVAIL (LDS hardware limit, so it affects nothing)

So GDS_MAX is wrong and may not exist because GDS is global and seen by all CUs and workgroups.

At least the question marks were right !

.rodata Under some conditions  LDS values had problems.  The value of LDS_USED also appears in the 8 integer block at the end of the .rodata section just after the text part. After changing both locations, the problems disappeared.

drallan

0 Likes

There is one more problem I just discovered but have not investigated:

I created an OpenCL kernel source file which contains two kernel functions (KFs). The first KF uses small amount of resources (registers, LDS, etc.) and the second KF uses a lot more. I ran KSA2 and exported the binary. The device configuration in the resulting ELF only shows the resources used by the first KF.

The question is if I decide to run the second KF, where does the driver find the resource (device config.) needed by the 2nd KF?

jg1

0 Likes