Hello,
I've been doing some work with the output of the OpenCL compiler for a project and am currently looking for the raw ISA bytecode output within the resulting binary. I've deduced the output of the OpenCL compiler looks something like this (with most sections omitted for brevity):
OpenCL Elf
The trouble is, I don't trust that second .text section in the AMD runtime elf as being strictly ISA bytecode. It generally looks like three different data sections, separated by chunks of zeros (in the sample programs I am compiling). I was wondering if someone could give me a bit of insight into the structure of this section. The r600 ISA spec is public, so I could try to decompile it by hand to see where it starts making sense, but I'd like to avoid the tedium if at all possible.
Thank you.
Solved! Go to Solution.
Read those architecture manuals:
For older Very Large Instruction Word hardware -> http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf
For Graphic Core Next -> http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Southern_Islands_Instruction_Set_Architecture.pdf
^^^^ Finally it's out
I'm 100% sure that the ".text: AMD runtime elf" is the actual CAL elf image that runs. You can recompile it yourself using the ".amdil: Text IL" section and calClCompile().
(Just remove the TextIL (opencl, and llvm) section from the OpenCl elf if you want to run the ".text: AMD runtime elf")
I'm sorry, you misunderstand. The output of the compiler is an elf file. The first .text section of that elf file is a nested elf file (in a slightly custom AMD specific format). The second .text section of that nested elf file is what I am talking about here. I am almost positive that section contains the raw ISA bytecode, but I am also almost positive it contains additional binary data. I want just the raw ISA bytecode.
Is that more clear?
Thank you.
I know it's an elf inside an elf.
"I want just the raw ISA bytecode."
Then simply get that section you've already mentioned. (the second .text section in that nested .elf)
Additional binary data is:
- ATI option data (like constant buffer sizes, LDS size, Wavefrontsize, and many more parameters, it's in an ELF compatible format)
- around 1 kbytes of zeroes before the ISA code.
- then comes the actual microcode. (on VLIW it has two parts separated with 512(?)bytes zero padding: program flow - and the actual alu code)
The above is all marked in the elf's program_header. (program header in the .elf inside the opencl .elf)
Thank you realhet. I may be misunderstanding you, but I believe our information is conflicting. For reference, I am compiling for RV730.
Here is my complete analysis of the internal ELF file:
Obviously, this analysis has some holes in it, and knowing everything specifically would be ideal. Now, if I'm understanding you correctly, the second .text section of the internal elf is the ISA bytecode, but is broken up into two parts (by the 192 bytes of 0 padding). Is that correct? Is there perhaps some structure I could see which describes this data?
Thank you.
relpats_eht,
The easiest way to see the structure is to disassemble the text and see what the various sections are. You can use calclDisassemble for the calBinary or -save-temps when compilng to the binary with the OpenCL api.
Yes, the second text section in the inner elf file contains the ISA binary code (instructions), word for word, and only the ISAcode. What might be confusing is that the inner elf has many other sections that point to various places, a second indexing. Some of these are "program" sections prg0, prg1, etc. But what you want is the second text section. The following is cryptic but might help as you have already gotten most of the way. It is the output dump from an assembler (144 bytes) showing the elf unpack and repack process. It is a bit long but probably clearer than trying to describe the process in words.
Other sections contain start up information used by the program like memory address's, memory descriptors, and constant data. The CAL CAL CAL part contains setup info such as pointers, code lengths, number of registers used, etc.
I have seen bits and pieces of documentation which leads me to believe some of this information may be available from AMD, possibly on request?
It would be nice to know where to find/how to get such info.
//---------------------------------------------------------------------------------
RUN OPENCL
ASSEMBLER OUTPUT:
asm:source 1721 bytes src
source errors: 0
code length: 144 bytes [36] <---------- (GENERATED CODE)
first address: 00000000
last address: 0000008C
optim. insn: 0
savebino: 1 bins, size=[15996, 0, 0]
//---------------------------------------------------------------------------------
DUMP OF ELF UNPACK STARTS HERE
//---------------------------------------------------------------------------------
wr_bin: bin1 -> bin/isatest.bn0 size=15996
Elf:filesize = 3E7C (15996)
Elf:sh table @ 15716, 7 sections
Elf:ph table @ 0, 0 programs
read nametab at 52 size =50
sh 0 type= 0 link(0) start 0 size 0 []
sh 1 type= 3 link(0) start 52 size 50 [.shstrtab]
sh 2 type= 3 link(0) start 102 size 66 [.strtab]
sh 3 type= 2 link(2) start 168 size 64 [.symtab]
sh 4 type= 1 link(0) start 232 size 391 [.rodata]
text(0) index = 5
sh 5 type= 1 link(0) start 623 size 15026 [.text]
sh 6 type= 1 link(0) start 15649 size 67 [.comment]
----------------------------------------------
enter the inner elf
----------------------------------------------
Elf:filesize = 3AB2 (15026)
Elf:sh table @ 292, 10 sections
Elf:ph table @ 52, 5 programs
ph 0 type=70000002 start 212 size 40 memsize 0
ph 1 type=00000004 start 692 size 532 memsize 0
ph 2 type=00000001 start 1224 size 7429 memsize 7429
ph 3 type=00000004 start 8653 size 1444 memsize 0
ph 4 type=00000001 start 10097 size 4929 memsize 4929
read nametab at 252 size =40
sh 0 type= 0 link(0) start 0 size 0 []
sh 1 type= 3 link(0) start 252 size 40 [.shstrtab]
text(0) index = 2
sh 2 type= 1 link(0) start 1224 size 2592 [.text]
sh 3 type= 1 link(0) start 3816 size 4736 [.data]
sh 4 type= 2 link(5) start 8552 size 80 [.symtab]
sh 5 type= 3 link(0) start 8632 size 21 [.strtab]
text(1) index = 6
sh 6 type= 1 link(0) start 10097 size 92 [.text]
sh 7 type= 1 link(0) start 10189 size 4736 [.data]
sh 8 type= 2 link(9) start 14925 size 80 [.symtab]
sh 9 type= 3 link(0) start 15005 size 21 [.strtab]
//---------------------------------------------------------------------------------
enter and work on inner elf
//---------------------------------------------------------------------------------
pass2 2 text sections
p2text0: offset=1224 size=2592
p2text1: offset=10097 size=92
FIX CODE section 6 .text[1] <----- (note: this is the second text section (of 0,1), inner elf)
codesize 92 -> 144 bytes (original 92 byte section replaced with 144 byte new program)
FIX CODE(1) end
SEC(10) move phtab by 52
SEC(10) move shtab by 52 (start moving things around)
EXPAND program 4 by 52
EXPAND section 6 by 52
MOVE section 7 by 52
MOVE section 8 by 52
MOVE section 9 by 52
(WOW)p34sum 6373 --> 6425
wr pgr0 start 212[ 212] size 40[ 40]
wr pgr1 start 692[ 692] size 532[ 532]
wr pgr2 start 1224[ 1224] size 7429[ 7429]
user elements: 3
uav memr buffer: s[4,7] type 10
constant buffer: s[8,11] type 0
argument buffer: s[12,15] type 1
vgpr registers: 4 -> 10
sgpr registers: 19 -> 25
lds aux mvalue: 001009A0 -> 00000000
gds max memory: 8000 -> 0
lds max memory: 2000 -> 8000
------------------------------------------
work done, now start reassembly, report what goes where
------------------------------------------
wr pgr3 start 8653[ 8653] size 1444[ 1444]
wr pgr4 start 10097[ 10097] size 4981[ 4929]
wr sec0 start 0[ 0] size 0[ 0]
wr sec1 start 252[ 252] size 40[ 40]
wr sec2 start 1224[ 1224] size 2592[ 2592]
wr sec3 start 3816[ 3816] size 4736[ 4736]
symtab(4) at 8552
=strtab(5) size 21
symbol(0) ndx= 0 size= 0 info=0 val= 0 []
symbol(1) ndx=16 size= 0 info=0 val= 0 [uav9]
symbol(2) ndx=16 size= 0 info=0 val= 1 [uav10]
symbol(3) ndx=10 size= 0 info=0 val= 0 [cb1]
symbol(4) ndx=10 size= 0 info=0 val= 1 [cb0]
wr sec4 start 8552[ 8552] size 80[ 80]
wr sec5 start 8632[ 8632] size 21[ 21]
save data section(6)
wr sec6 start 10097[ 10097] size 144[ 92]
wr sec7 start 10241[ 10189] size 4736[ 4736]
symtab(8) at 14925
=strtab(9) size 21
symbol(0) ndx= 0 size= 0 info=0 val= 0 []
symbol(1) ndx=16 size= 0 info=0 val= 0 [uav9]
symbol(2) ndx=16 size= 0 info=0 val= 1 [uav10]
symbol(3) ndx=10 size= 0 info=0 val= 0 [cb1]
symbol(4) ndx=10 size= 0 info=0 val= 1 [cb0]
wr sec8 start 14977[ 14925] size 80[ 80]
wr sec9 start 15057[ 15005] size 21[ 21]
wr pent 0 at 212 offset 0 (pent means elf "program entry")
wr pent 1 at 692 offset 0
wr pent 2 at 1224 offset 0
wr pent 3 at 8653 offset 0
wr pent 4 at 10097 offset 0
wr sent 0 at 0 offset 0 (sent means elf section entry)
wr sent 1 at 252 offset 0
wr sent 2 at 1224 offset 0
wr sent 3 at 3816 offset 0
wr sent 4 at 8552 offset 0
wr sent 5 at 8632 offset 0
wr sent 6 at 10097 offset 0
wr sent 7 at 10241 offset 0
wr sent 8 at 14977 offset 0
wr sent 9 at 15057 offset 0
----------------------------------------------
inner elf now repacked, need to resize outer elf
----------------------------------------------
p1text1: newsize=15026 oldsize=15078
SEC(7) move phtab by 52
SEC(7) move shtab by 52
EXPAND section 5 by 52
MOVE section 6 by 52
wr sec0 start 0[ 0] size 0[ 0]
wr sec1 start 52[ 52] size 50[ 50]
wr sec2 start 102[ 102] size 66[ 66]
symtab(3) at 168
=strtab(2) size 66
symbol(0) ndx= 0 size= 0 info=0 val= 0 []
symbol(1) ndx= 4 size= 359 info=1 val= 0 [__OpenCL_test_metadata]
symbol(2) ndx= 5 size=15026 info=2 val= 0 [__OpenCL_test_kernel]
found _kernel symbol, section 5 size 15026 --> 15078
symbol(3) ndx= 4 size= 32 info=1 val= 359 [__OpenCL_test_header]
wr sec3 start 168[ 168] size 64[ 64]
rodata:lds memory: 2000 -> 8000
wr sec4 start 232[ 232] size 391[ 391]
save data section(5)
wr sec5 start 623[ 623] size 15078[ 15026]
wr sec6 start 15701[ 15649] size 67[ 67]
wr sent 0 at 0 offset 0
wr sent 1 at 52 offset 0
wr sent 2 at 102 offset 0
wr sent 3 at 168 offset 0
wr sent 4 at 232 offset 0
wr sent 5 at 623 offset 0
wr sent 6 at 15701 offset 0
==========================================================
/
Thanks for the additional information. What tools did you use to generate that output? It still seems rather odd to me for bytecode to have so many 0's in it. I should probably review the ISA spec some to see why.
The output is from an ISA binary re-loader I wrote, the debug information helped me figure out the file structure as I wrote it.
If you are using a public elf tool, it might not be so easy because the inner elf is a little non standard, but it should still work.
The code should not contain 0s, below, p2text1 is the second of the 2 text sections in the inner elf. It has only 92 bytes and
no zero (words) Every 32 bit entry is a hardware instruction. If you see a large pile of 0s, maybe you have not
at to the second text section? The first text section (inner elf) usually has a lot of 0 padding.
pass2 2 text sections
p2text0: offset=1224 size=2592
p2text1: offset=10097 size=92
FIX CODE section 6 .text[1] <----- (note: this is the second text section (of 0,1), inner elf)
codesize 92 -> 144 bytes (original 92 byte section replaced with 144 byte new program)
No, it's definitely the second text section. Though, for me, the first text section is solid IL bytecode. No zeros, the closest thing to padding is the comments.
VLIW machine code usually DOES contain zeroes -> those are the padding between the program-flow and the alu instruction streams.
Here's an example: (section 6) http://x.pgy.hu/~worm/het/vliw_isa.JPG
(in contrast to this, GCN machine code doesn't contain any padding zeroes because it has only a single continuous instruction stream.)
And then: sec7 is that mistique 4736 bytes of zeroes
sec8 is the symbol table
sec9 is the string table for the symtab
Finally the 4th (0based) entry in the program header encapsulates sec6,sec7,sec8 and sec9, that's what is loaded up to the driver.
Also prog3 contains the ATI CAL key/value pairs. Those are a bit complex, first you have to understand the format of them: It's called "Note section" in the ELF format docs.
Thank you realhet. I've not heard the acronyms VLIW and GCN before, what do these mean?
Were I to try to load this code directly to the card, would I need to remove the padding or modify the code in any way?
Thank you.
Read those architecture manuals:
For older Very Large Instruction Word hardware -> http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf
For Graphic Core Next -> http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_Southern_Islands_Instruction_Set_Architecture.pdf
^^^^ Finally it's out
Thank you again. I didn't know that documentation existed.
My research happens to require reloading of ISA microcode into a opencl ELF. The information provided by drallan's reply , with outputs of his reloader, is very useful to me. In particular, I need to know if a new binary (the ISA code in the second .text section of the nested ELF) is injected into an existing OpenCL ELF, what are other parts of the ELF file that need to be changed, besides the length adjustments of various sections. For example, the output of drallan's reloader include the following lines:
(WOW)p34sum 6373 --> 6425
wr pgr0 start 212[ 212] size 40[ 40]
wr pgr1 start 692[ 692] size 532[ 532]
wr pgr2 start 1224[ 1224] size 7429[ 7429]
user elements: 3
uav memr buffer: s[4,7] type 10
constant buffer: s[8,11] type 0
argument buffer: s[12,15] type 1
vgpr registers: 4 -> 10
sgpr registers: 19 -> 25
lds aux mvalue: 001009A0 -> 00000000
gds max memory: 8000 -> 0
lds max memory: 2000 -> 8000
This fragment seems to suggest that some kernel program information, e.g., "uav memr buffer" and "vgpr registers", presumably stored in the NOTE segment, must be modified. The only documentation I can find is AMD's CAL Programming Guide, which gives the structure of the records in the NOTE segment but it does not explain how these program information are stored. Could AMD support staff or drallan shed some light on this?
Thank you.
Can you please give me a link for that CAL Programming Guide? I can't find it on developer.amd.com o.0
Things in the note section I know about:
- user elements: initial configuration for resource constants. In the example there are 2 resources: uav, constbuf, argument buf. For example: In the ISA code you can access the UAV with the registers s[4:7]. Resource Constants are similar to memory range descriptors in an x86 system. (More info in -> GCN ISA manual)
You basically don't have to modify these, just use it in the form as OpenCL compiled it. The only exception is the size information at a constant buffer declaration.
"which gives the structure of the records in the NOTE segment but it does not explain how these program information are stored." You mean, how the standard ELF note section is stored? -> Here it is http://www.skyfree.org/linux/references/ELF_Format.pdf Search for "Note Section" in it.
Is there a note key list in that "CAL Programming Guide" you mentioned? I have some keys that I had hacked out of the cal image, but if there's a complete list, I'd be glad to see it.
Hi realhet,
I could not remember where I found the document. If you let me know how I can send you a copy, I would be happy to do so.
Thank you for your help. Sorry I did not explain very clearly what I need in my first message. I have read the ELF format document, the GCN architecture ppt, and the SI-GPU ISA document. The ELF format pdf specifies the general ELF format. The Appendix B of the CAL Programming Guide specifies how the ELF format is used in the CAL context, that includes, for examples, why there are two NOTE and LOAD segments (thus two .text sections) and what information is stored in the NOTE segment. However, it does not provide any information about where the kernel program info (like # of vgprs, sgprs, lds memory, etc.) are stored. It might be explained in drallan's attachment -- I will verify this in my program.
Hi realhet,
To answer the question in your message: Yes, the document includes the definition of the CALNoteHeader struct,
typedef struct {
Elf32_Word namesz; /* size of the name field. Must be 8 */
Elf32_Word descsz; /* size of the data payload */
Elf32_Word type; /* type of the payload */
char name[8]; /* note header string. Must be “ATI CAL” */
} CALNoteHeader;
and Type Identifiers:
Note Type Description
Inputs ELF_NOTE_ATI_INPUTS Inputs used by the program.
Outputs ELF_NOTE_ATI_OUTPUTS Outputs written by the program.
FloatConsts ELF_NOTE_ATI_FLOATCONSTS Float constant data segment desc
IntConsts ELF_NOTE_ATI_INTCONSTS Integer constant data segment desc
BoolConsts ELF_NOTE_ATI_BOOLCONSTS Boolean constant data segment d
EarlyExit ELF_NOTE_ATI_EARLYEXIT Program termination description.
GlobalBuffers ELF_NOTE_ATI_GLOBAL_BUFFERS Global import/export buffer.
ConstantBuffers ELF_NOTE_ATI_CONSTANT_BUFFERS Constant buffer usage mask.
SamplerMap ELF_NOTE_ATI_INPUT_SAMPLER_MAP Input to sampler binding table.
ScratchBuffer ELF_NOTE_ATI_SCRATCH_BUFFER Scratch memory usage mask.
PersistentBuffer ELF_NOTE_ATI_PERSISTANT_BUFFER Persistent memory usage mask.
ProgramInfo ELF_NOTE_ATI_PROGRAM_INFO Device configuration table.
Hi jg1,
Note that the inner elf has about 5 'program' sections 3 of which overlap some of the normal text sections,
this is helpful when calculating the new section offsets and sizes. The program sections have their own
program headers instead of the regular section headers.
1. Replace the gcn binary code in Inner file: second text section.
2. Calculate and change all required sizes in section headers and program headers.
3. Change a few parameters in the inner file 4th (index 3) 'program' section.
(this includes things like the no. of SGPRs, VGPRs, lds and gds memory requirements)
After packing the inner elf
4. Change appropriate section sizes in outer elf section headers.
5. In the outer elf ".rodata" section change the lds memory maximum if needed (see attached notes/code snippets)
6. In the outer symbol table entries, change the size of the section with the kernel name (need to look at strtab for the name,
again there is a code snippet in the attached notes.)
Then, pack that thing!
There is more info in the notes that I couldn't paste here because this editor keeps inserting everything into tables.
The notes also mention how to change the parameters in the 4th program section in the inner file. Basically you look
for a 32 bit key code first and then read/write the data in the following int. There is a table of important key codes in
the notes.
Note, some of the data output from my program does not need to be changed, i.e., the uav memr buffer register assignments and type.
These are just for reference and/or can be used for programming at a higher level, like a compiler.
Good luck, it's really not that difficult, only made a little complicated by the elf format.
Sorry for the jumbled notes.
drallan
Hi drallan,
Thank you for your reply. It gives quite a lot of information. I think the linkdat struct is what I am looking for. I will do some tests and let you know the results. Thanks a lot.
P.S. By the way, where in AMD's documents can I find information of the linkdat struct?
jg1 wrote:
Hi drallan,
Thank you for your reply. It gives quite a lot of information. I think the linkdat struct is what I am looking for. I will do some tests and let you know the results. Thanks a lot.
P.S. By the way, where in AMD's documents can I find information of the linkdat struct?
Hi jg1,
The linkdat structure is from my elf repacker and is a partial list of entries that I have found useful. I don't know of any AMD document that lists their meaning and values. Some I have dug out and I am looking to see if I have some additional info, I think I must.
The 5 elf program sections that I mentioned are what the CAL programming manual refers to as the PT_NOTE segment. The CAL programming guide describes the basic structure of the inner elf but omits some useful information. The 2 'program' sections that overlap the elf sections contain the executable and load data. Usually you only need to fix the executable section.
The other 3 program sections (0, 1, 3), contain information about the program and set up. section 3 is the one with the important information.
One of the CAL entries (stuff with the ATI_CAL string) contains a large block of info that contains the information fields found in linkdat. In a sense, they are below, or contained inside, the CAL entry format description in the programmers manual.
To help visualize this, I have attached dumps of the 3 'program' sections, and also the ".rodata" section. Most of the rodata is ascii and is a copy of info you would see from the Kernel analyzer output. There is a small binary section after the ascii part where I change the lds memory allocation. You may not need that.
If I find any more good info I will post it here.
drallan
Hi drallan,
Thank you again for your kind help. I added your inputs to my readelf program and attached the outputs here. Next I am going to do the repacking. I will post my findings here.
The linkdat structure looks like the "device configuration table" mentioned in the CAL Programming Guide. The declaration of the CALProgramInfoEntry is as follows:
typedef struct {
Elf32_Word address; /* device address */
Elf32_Word value; /* value */
} CALProgramInfoEntry;
It is probably used by the driver to program the GPU chips through device I/O ports.
jg1
The linkdat structure looks like the "device configuration table" mentioned in the CAL Programming Guide. [...]
typedef struct {
Elf32_Word address; /* device address */
Elf32_Word value; /* value */
} CALProgramInfoEntry;
It is probably used by the driver to program the GPU chips through device I/O ports.
Yes, I think that must be correct. The "device address" also looks like a PCI bus IO transaction.
I checked, the entries in linkdat were all dug from code, I don't have any other docs on the device configuration table values.
drallan
Hi drallan and realhet,
My repacker is now working. Thank you for all your help. I really appreciate it.
For the purpose of repacking ISA code (with the same kernel function signature), drallan's instructions are good enough.
I also find good information in this file. Look for struct si_bin_enc_dict_entry_t, and enum _E_SC_USER_DATA_CLASS. You might want to look at other files at the web site to see how the dots are connected. I modified my readelf program again and its output is in the attachment.
I still have not found information about the use of GDS_MAX, LDS_MAX, and LDS_AUX. But they do not seem to matter with the level of complexity of my current kernel. I will post here if I find anything.
jg1
Wow! Nice find -> multi2sim
http://www.multi2sim.org/svn/multi2sim/trunk/src/arch/southern-islands/emu/machine.c
This seems to be an accurate 'documentation on every GCN instructions
Also there's a complete* inverse assembler -> http://www.multi2sim.org/svn/multi2sim/trunk/src/arch/southern-islands/asm/asm.c
I think don't you wanna understand all the values in the .elf, only patch that few you really use and leave the rest for the ocl compiler.
GDS_MAX, LDS_MAX, and LDS_AUX: That's new to me too: I only tried LDS on CAL, but there the only thing I had to change was the control word marked with PGM2_COUMPUTE_RESOURCES in the disasm. If I recall bits8:15 (or something like this) represented the LDS allocation size. But seems like OCL stores it redundantly.
"as10.txt.zip" <- Well done!
Edit: Oh noes, multi2sim is full of inumplemented instructions
Hi realhet,
You are right. Multi2sim can be a good reference when we need to find out execution details of an ISA instructions. I have not studied yet but it is good to know where to look to when I need.
Thanks for the info of LDS. They are bits 8:16 in COMPUTE_PGM_RSRC2. So you are only a tiny "bit" off.
jg1 wrote:
Hi drallan and realhet,
My repacker is now working.
Thank you for all your help. I really appreciate it.
For the purpose of repacking ISA code (with the same kernel function signature), drallan's instructions are good enough.
I also find good information in this file. Look for struct si_bin_enc_dict_entry_t, and enum _E_SC_USER_DATA_CLASS. You might want to look at other files at the web site to see how the dots are connected. I modified my readelf program again and its output is in the attachment.
I still have not found information about the use of GDS_MAX, LDS_MAX, and LDS_AUX. But they do not seem to matter with the level of complexity of my current kernel. I will post here if I find anything.
That was fast! Congratulations.
I have the multi2sim files. They are very good for gcn code but they don't have the device configuration table info (Where is it!?)
The _E_SC_USER_DATA_CLASS looks real interesting but the numbers don't seem to correlate with anything. Maybe specific to multi2sim?
The Linux Mesa/Gallium project also has some (too many ! ) files but they are a bit less clear.
You can google mesa southern islands.
I would be very interested if you or realhet find any more good info, and happy to share anything I find too...
drallan
Thanks, drallan.
Re: the device configuration table and data class enum:
in asm.h, look for this:
/* Encoding dictionary entry */
struct si_bin_enc_dict_entry_t
{
...
unsigned int userElementCount;
struct si_bin_enc_user_element_t userElements[16];
...
};
I think the name "si_binenc_dict_entry_t" is incorrect. For my understanding, this structure is actually the device configuration table. It is one of the Note Headers in the second Note segment. The following lines are taken from my readelf program output (see my last post):
CAL Note Segment:
TypeField Type PayloadSize
0x00000002 (TBD) 0x0
0x00000003 (TBD) 0x0
0x00000010 (TBD) 0x40
0x00000004 (TBD) 0x4
0x00000005 (TBD) 0x0
0x00000006 (TBD) 0x0
0x00000007 (TBD) 0x0
0x00000008 (TBD) 0x4
0x00000009 (TBD) 0x0
0x0000000a (TBD) 0x10
0x0000000b (TBD) 0x0
0x0000000d (TBD) 0x4
0x0000000c (TBD) 0x0
0x00000001 (TBD) 0x390 <----- Device config table
0x0000000e (TBD) 0x0
0x0000000f (TBD) 0x4
0x00000011 (TBD) 0x80
The payload of the header is a list of CALProgramInfoEntry structs
typedef struct {
Elf32_Word address; /* device address */
Elf32_Word value; /* value */
} CALProgramInfoEntry;
and they look like this:
Device configuration table (Program Information):
TypeField Type Value
0x80001000 USER_ELEMENTS 0x3
0x80001001 UE0_DataClass 0x17
0x80001002 UE0_APISlot 0x0
0x80001003 UE0_SGPRStart 0x2
0x80001004 UE0_SGPRCount 0x2
0x80001005 UE1_DataClass 0x2
0x80001006 UE1_APISlot 0x0
0x80001007 UE1_SGPRStart 0x4
0x80001008 UE1_SGPRCount 0x4
0x80001009 UE2_DataClass 0x2
0x8000100a UE2_APISlot 0x1
0x8000100b UE2_SGPRStart 0x8
...
The first item is the user elements count, following by 16 user element structs defined as:
struct si_bin_enc_user_element_t
{
unsigned int dataClass;
unsigned int apiSlot;
unsigned int startUserReg;
unsigned int userRegCount;
};
The first field is the data class. Its value is defined by _E_SC_USER_DATA_CLASS. In the above example, the first user element, UE0, has a data class value of 0x17, which is a PTR_UAV_TABLE. Likewise, UE1 and UE2 are of IMM_CONST_BUFFER data class.
I will look at Mesa/Gallium, thanks. I am happy to share my findings, too.
Hey,
Do you wanna make your ISA wrapper to be able to define kernel parameters as well?
I think it gonna be really usefull, but where to get all the information for it from
I'm thinking it more simpler: Don't wanna understand how those 16 elements works and also understand HOW OPENCL PASSES BUFFER ptrs. I'll need to use them from the asm side also. And what if AMD changes that structure (the can rewrite it, and generate different amd_il code, you know)?!
I'm thinking about to compile skeleton opencl code and then patch it automatically.
Anyways, now I try to make a 99% OpenCl compatible patcher which works without LDS/GDS. And after I have some question marks about LDS/GDS.
realhet wrote:
Hey,
Do you wanna make your ISA wrapper to be able to define kernel parameters as well?
I think it gonna be really usefull, but where to get all the information for it from
I'm thinking it more simpler: Don't wanna understand how those 16 elements works and also understand HOW OPENCL PASSES BUFFER ptrs. I'll need to use them from the asm side also. And what if AMD changes that structure (the can rewrite it, and generate different amd_il code, you know)?!
I'm thinking about to compile skeleton opencl code and then patch it automatically.
Anyways, now I try to make a 99% OpenCl compatible patcher which works without LDS/GDS. And after I have some question marks about LDS/GDS.
Yes, I am interested in a more complete solution. I do not need it at this moment but knowing the use of various bits gives me more confidence that any hacking/patching will work in different conditions. I will look into it when I have more time.
jg1
Something's not round with gds/lds:
drallan:
{0x80000081,"GDS_MAX "}, //???
{0x80000082,"LDS_MAX "}, //also 80001045 see below
{0x80001045,"LDS_AUX "} //also 80001045 see below
+ .rodata
multi2sim:
{ "AMU_ABI_LDS_SIZE_AVAIL", 0x80000081 },
{ "AMU_ABI_LDS_SIZE_USED", 0x80000082 },
I say:
key 0x80000081 is always $80000, whatever I do o.o
key Computr_pgm_rsc2 bits[16:24] is the actual LDS size
What's the truth? On CAL my version works fine, I tested it. Doing some lds size comparison soon.
Other stuff:
There is the 0th program entry in the cal.elf. It's type in $70000002 and inside there are offset/size pairs referencing to the amd_il and isa sections offseted with a misterious offset. Do you know this? I can shift it anyways according to the new isa size. Maybe it's an offset relative to the OpenCl.elf image...
realhet wrote:
Other stuff:
There is the 0th program entry in the cal.elf. It's type in $70000002 and inside there are offset/size pairs referencing to the amd_il and isa sections offseted with a misterious offset. Do you know this? I can shift it anyways according to the new isa size. Maybe it's an offset relative to the OpenCl.elf image...
I can take this question. Here is the how this LOPROC+2 Program header looks (in as10.txt) (the Program header contains what is called Encoding Dictionary in the CAL Programming Guide)
Encoding Dictionary:
Type Offset Size Flags
0x00000004 0x2b4 0x8e3dd 0x0
0x00000004 0x8e691 0xbc59 0x0
and the Program Headers
Program Headers:
Type Offset VAddr PAddr FileSiz MemSiz
LOPROC+2 0xd4 0x0 0x0 0x28 0x0
NOTE 0x2b4 0x0 0x0 0x234 0x0
LOAD 0x4e8 0x0 0x0 0x8e1a9 0x8e1a9
NOTE 0x8e691 0x0 0x0 0x5c4 0x0
LOAD 0x8ec55 0x0 0x0 0xb695 0xb695
So the "mysterious offset" is really the offset of the 1st Note segment. And the Size (0x8e3dd) is the sum of the 1st Note+Load segments, Is this what you are looking for?
jg1
I see now, thanks. (I was dumb enough not to search in the program header lol) Now I can regenerate that part from actual program header information.
Then I installed 12.8 driver (11.12 was a bit old for lds/gds tests hehe)
LDS:
I've find out how works (mostly like drallan said):
- ocl.elf.rodata -> see last 8 dwords, dword[3] is the LDS size in bytes (also specified in the ascii dump in the form "hwlocal:n" but let's hope opencl runtime doesn't rely on that)
- cal.elf notes key $00002e13 bits[15:23] (not [16:24] as in multi2sim) specified in 256byte units
- cal.elf notes key $80000082 in 1byte units
GDS:
I've tried it using CAL: I allocated 256bytes of gds, and also used it for an atomic addition. Then I changed the size to 512 bytes, and the only thing changed in cal.elf was that byte 256->512 inside the amd_il code.
So I think we can use GDS under OpenCL as far as no two kernels uses the same parts of the whole GDS.
Unimportant parts:
- amd_il .text section
- both .data sections filled with 4KB zeroes
Although the ascii part of .rodata is important for the disassembler and also for the opencl runtime. I guess the amd_il note part is also unimportant, but I was lazy to write code to replace a program entry.
The ocl elf size reduced from 13.5KB to 3.5KB, and still works.
Attached a 'pseudo' code on how to do the patching:
Good job on the LDS/GDS!
Although the ascii part of .rodata is important for the disassembler and also for the opencl runtime. I guess the amd_il note part is also unimportant, but I was lazy to write code to replace a program entry.
The ocl elf size reduced from 13.5KB to 3.5KB, and still works.
I believe that the 1st (amdil) Note segment is not used (in the CAL image, or nested ELF), except there is no ISA code present, in which case, the driver will have to compile the IL into ISA.
jg1
hi realhet,
Just curious what script language are you using in your attachment?
One question:
SetCalNote($00002e13,(opts.ldsSizeBytes+255)shr 8 shl 15,$FFF07FFF{and mask});
maybe it is a typo of {or mask}?
I verified this with my kernel. the LDS size is indeed in bits [15:23].
jg1
{ } in my code is just a pascal comment. Setnote(key,incr,mask=0) just searches the key and apply the and mask and finally incrementing the value.
LDS:
I've find out how works (mostly like drallan said):
- ocl.elf.rodata -> see last 8 dwords, dword[3] is the LDS size in bytes (also specified in the ascii dump in the form "hwlocal:n" but let's hope opencl runtime doesn't rely on that)
I ran some experiments to get to the bottom of the confusing rodata question. Surprisingly, its the master setting for the kernel's lds memory. It always works and will override settings in compute_pgm_rsrc2. When the rodata setting is 0, compute_pgm_rsrc2 is used. Compute_pgm_rsrc2 may not work when an ocl skeleton kernel uses lds memory because this sets rodata. If both rodata and compute_pgm_rsrc2 settings are used but have different values, rodata defines the amount of lds memory.
I compiled a small ocl kernel and replaced it with a gcn isa program. The isa program reads and writes a series of 256 byte blocks of lds. The ocl kernel was compiled with and without any lds memory then the lds memory settings were changed in different locations before running the isa code. The isa program shows which parts of the lds memory were active.
Output during setup.
+--------------------lds setting from ocl kernel
| +-------------lds setting at repack
| | +-----location of setting
lds memory: 100 -> 100 (compute_pgm_rsrc2)
Results: OCL kernel defines some lds memory (note 77777777 means LDS is active)
lds memory: 100 -> 100 (compute_pgm_rsrc2)
lds memory: 100 -> 400 (abi_lds_size_used)
lds memory: 100 -> 100 (.rodata kernel header)
77777777 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ( kernel output)
0x000 0x100 0x200 0x300 0x400 0x500 0x600 0x700 (lds address)
lds memory: 100 -> 400 (compute_pgm_rsrc2)
lds memory: 100 -> 100 (abi_lds_size_used)
lds memory: 100 -> 100 (.rodata kernel header)
77777777 00000000 00000000 00000000 00000000 00000000 00000000 00000000
lds memory: 100 -> 100 (compute_pgm_rsrc2)
lds memory: 100 -> 100 (abi_lds_size_used)
lds memory: 100 -> 400 (.rodata kernel header)
77777777 77777777 77777777 77777777 00000000 00000000 00000000 00000000
OCL kernel defines no lds memory
lds memory: 0 -> 400 (compute_pgm_rsrc2)
lds memory: 0 -> 0 (abi_lds_size_used)
lds memory: 0 -> 0 (.rodata kernel header)
77777777 77777777 77777777 77777777 00000000 00000000 00000000 00000000
lds memory: 0 -> 0 (compute_pgm_rsrc2)
lds memory: 0 -> 400 (abi_lds_size_used)
lds memory: 0 -> 0 (.rodata kernel header)
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
lds memory: 0 -> 0 (compute_pgm_rsrc2)
lds memory: 0 -> 0 (abi_lds_size_used)
lds memory: 0 -> 400 (.rodata kernel header)
77777777 77777777 77777777 77777777 00000000 00000000 00000000 00000000
lds memory: 100 -> 400 (compute_pgm_rsrc2) rodata overrides compute_pgm_rsrc2
lds memory: 100 -> 100 (abi_lds_size_used)
lds memory: 100 -> 200 (.rodata kernel header)
77777777 77777777 00000000 00000000 00000000 00000000 00000000 00000000
Useful info, thanks!
Today I'm going to fiddle with different kernel function headers, maybe it's not that hard and and then my program can make that 'element table' manually.
Something's not round with gds/lds:
drallan:
{0x80000081,"GDS_MAX "}, //???
{0x80000082,"LDS_MAX "}, //also 80001045 see below
{0x80001045,"LDS_AUX "} //also 80001045 see below
+ .rodata
multi2sim:
{ "AMU_ABI_LDS_SIZE_AVAIL", 0x80000081 },
{ "AMU_ABI_LDS_SIZE_USED", 0x80000082 },
multi2sim is more round!
0x80000082 LDS_SIZE_USED is the same as LDS_MAX (max used by a workgroup)
0x80000081 is LDS_SIZE_AVAIL (LDS hardware limit, so it affects nothing)
So GDS_MAX is wrong and may not exist because GDS is global and seen by all CUs and workgroups.
At least the question marks were right !
.rodata Under some conditions LDS values had problems. The value of LDS_USED also appears in the 8 integer block at the end of the .rodata section just after the text part. After changing both locations, the problems disappeared.
drallan
There is one more problem I just discovered but have not investigated:
I created an OpenCL kernel source file which contains two kernel functions (KFs). The first KF uses small amount of resources (registers, LDS, etc.) and the second KF uses a lot more. I ran KSA2 and exported the binary. The device configuration in the resulting ELF only shows the resources used by the first KF.
The question is if I decide to run the second KF, where does the driver find the resource (device config.) needed by the 2nd KF?
jg1