46 Replies Latest reply on Oct 24, 2012 7:20 AM by realhet

    Raw ISA bytecode location

    relpats_eht

      Hello,

       

      I've been doing some work with the output of the OpenCL compiler for a project and am currently looking for the raw ISA bytecode output within the resulting binary. I've deduced the output of the OpenCL compiler looks something like this (with most sections omitted for brevity):

       

      OpenCL Elf

      • .amdil: Text IL
      • .rodata: text runtime information
      • .text: AMD runtime elf
        • .text: IL bytecode
        • .text: Data containing ISA bytecode

      The trouble is, I don't trust that second .text section in the AMD runtime elf as being strictly ISA bytecode. It generally looks like three different data sections, separated by chunks of zeros (in the sample programs I am compiling). I was wondering if someone could give me a bit of insight into the structure of this section. The r600 ISA spec is public, so I could try to decompile it by hand to see where it starts making sense, but I'd like to avoid the tedium if at all possible.

       

      Thank you.

        • Re: Raw ISA bytecode location
          realhet

          I'm 100% sure that the ".text: AMD runtime elf" is the actual CAL elf image that runs. You can recompile it yourself using the ".amdil: Text IL" section and calClCompile().

          (Just remove the TextIL (opencl, and llvm) section from the OpenCl elf if you want to run the ".text: AMD runtime elf")

            • Re: Raw ISA bytecode location
              relpats_eht

              I'm sorry, you misunderstand. The output of the compiler is an elf file. The first .text section of that elf file is a nested elf file (in a slightly custom AMD specific format). The second .text section of that nested elf file is what I am talking about here. I am almost positive that section contains the raw ISA bytecode, but I am also almost positive it contains additional binary data. I want just the raw ISA bytecode.

               

              Is that more clear?

               

              Thank you.

                • Re: Raw ISA bytecode location
                  realhet

                  I know it's an elf inside an elf.

                   

                  "I want just the raw ISA bytecode."

                  Then simply get that section you've already mentioned. (the second .text section in that nested .elf)

                   

                  Additional binary data is:

                  - ATI option data (like constant buffer sizes, LDS size, Wavefrontsize, and many more parameters, it's in an ELF compatible format)

                  - around 1 kbytes of zeroes before the ISA code.

                  - then comes the actual microcode. (on VLIW it has two parts separated with 512(?)bytes zero padding: program flow - and the actual alu code)

                  The above is all marked in the elf's program_header. (program header in the .elf inside the opencl .elf)

                    • Re: Raw ISA bytecode location
                      relpats_eht

                      Thank you realhet. I may be misunderstanding you, but I believe our information is conflicting. For reference, I am compiling for RV730.

                       

                      Here is my complete analysis of the internal ELF file:

                      • nameless section: 0 bytes
                      • .shstrtab
                      • Unlisted section.
                        • Miscellaneous unknown data
                        • Contains many "ATI CAL" lines (see below)
                      • .text
                        • IL bytecode
                      • .data
                        • 4736 bytes of 0's
                      • .symtab
                      • .strtab
                      • Unlisted section.
                        • Many lines of "ATI CAL"
                          • Has the string "ATI CAL" followed by a few bytes of unknown data
                        • followed by register settings
                        • followed by unknown key value pairs in the format 0x8000000X value
                          • Value is usually 0, except for 6 it is 20 and A it is 1.
                        • More "ATI CAL" lines
                        • More 256 bytes of 0's
                      • .text
                        • ISA bytecode?
                        • 64 bytes of data
                        • 192 bytes of 0
                        • 308 bytes of data
                        • 92 bytes of 0
                        • 32 bytes of data
                      • .data
                        • 4736 bytes of 0
                      • .symtab
                      • .strtab

                       

                      Obviously, this analysis has some holes in it, and knowing everything specifically would be ideal. Now, if I'm understanding you correctly, the second .text section of the internal elf is the ISA bytecode, but is broken up into two parts (by the 192 bytes of 0 padding). Is that correct? Is there perhaps some structure I could see which describes this data?

                       

                      Thank you.

                • Re: Raw ISA bytecode location
                  drallan

                  Yes, the second text section in the inner elf file contains the ISA binary code (instructions), word for word, and only the ISAcode. What might be confusing is that the inner elf has many other sections that point to various places, a second indexing. Some of these are "program" sections prg0, prg1, etc. But what you want is the second text section. The following is cryptic but might help as you have already gotten most of the way. It is the output dump from an assembler (144 bytes) showing the elf unpack and repack process. It is a bit long but probably clearer than trying to describe the process in words.

                   

                  Other sections contain start up information used by the program like memory address's,  memory descriptors, and constant data. The CAL CAL CAL part contains setup info such as pointers, code lengths, number of registers used, etc.

                   

                  I have seen bits and pieces of documentation  which leads me to believe some of this information may be available from AMD, possibly on request?

                  It would be nice to know where to find/how to get such info.

                   

                  //---------------------------------------------------------------------------------

                  RUN OPENCL

                  ASSEMBLER OUTPUT:

                  asm:source 1721 bytes src

                  source errors: 0

                  code length:   144 bytes [36]      <---------- (GENERATED CODE)

                  first address: 00000000

                  last address:  0000008C

                  optim. insn:   0

                  savebino: 1 bins, size=[15996, 0, 0]

                  //---------------------------------------------------------------------------------

                  DUMP OF ELF UNPACK STARTS HERE

                  //---------------------------------------------------------------------------------

                  wr_bin: bin1 -> bin/isatest.bn0 size=15996

                  Elf:filesize =    3E7C (15996)

                  Elf:sh table @ 15716,      7 sections

                  Elf:ph table @     0,      0 programs

                  read nametab at 52 size =50

                  sh 0 type= 0 link(0) start      0 size      0 []

                  sh 1 type= 3 link(0) start     52 size     50 [.shstrtab]

                  sh 2 type= 3 link(0) start    102 size     66 [.strtab]

                  sh 3 type= 2 link(2) start    168 size     64 [.symtab]

                  sh 4 type= 1 link(0) start    232 size    391 [.rodata]

                  text(0) index = 5

                  sh 5 type= 1 link(0) start    623 size  15026 [.text]

                  sh 6 type= 1 link(0) start  15649 size     67 [.comment]

                  ----------------------------------------------

                  enter the inner elf

                  ----------------------------------------------

                  Elf:filesize =    3AB2 (15026)

                  Elf:sh table @   292,     10 sections

                  Elf:ph table @    52,      5 programs

                  ph 0 type=70000002 start    212 size     40 memsize      0

                  ph 1 type=00000004 start    692 size    532 memsize      0

                  ph 2 type=00000001 start   1224 size   7429 memsize   7429

                  ph 3 type=00000004 start   8653 size   1444 memsize      0

                  ph 4 type=00000001 start  10097 size   4929 memsize   4929

                  read nametab at 252 size =40

                  sh 0 type= 0 link(0) start      0 size      0 []

                  sh 1 type= 3 link(0) start    252 size     40 [.shstrtab]

                  text(0) index = 2

                  sh 2 type= 1 link(0) start   1224 size   2592 [.text]

                  sh 3 type= 1 link(0) start   3816 size   4736 [.data]

                  sh 4 type= 2 link(5) start   8552 size     80 [.symtab]

                  sh 5 type= 3 link(0) start   8632 size     21 [.strtab]

                  text(1) index = 6

                  sh 6 type= 1 link(0) start  10097 size     92 [.text]

                  sh 7 type= 1 link(0) start  10189 size   4736 [.data]

                  sh 8 type= 2 link(9) start  14925 size     80 [.symtab]

                  sh 9 type= 3 link(0) start  15005 size     21 [.strtab]

                  //---------------------------------------------------------------------------------

                  enter and work on inner elf

                  //---------------------------------------------------------------------------------

                  pass2 2 text sections

                  p2text0: offset=1224 size=2592

                  p2text1: offset=10097 size=92

                  FIX CODE section 6 .text[1]   <-----   (note: this is the second text section (of 0,1), inner elf)

                  codesize 92 -> 144 bytes              (original 92 byte section replaced with 144 byte new program)

                  FIX CODE(1) end

                  SEC(10) move phtab by 52

                  SEC(10) move shtab by 52               (start moving things around)

                  EXPAND program 4 by 52

                  EXPAND section 6 by 52

                  MOVE section 7 by 52

                  MOVE section 8 by 52

                  MOVE section 9 by 52

                  (WOW)p34sum 6373 --> 6425

                  wr pgr0  start    212[   212] size     40[    40]

                  wr pgr1  start    692[   692] size    532[   532]

                  wr pgr2  start   1224[  1224] size   7429[  7429]

                  user elements: 3

                  uav memr buffer: s[4,7] type 10

                  constant buffer: s[8,11] type 0

                  argument buffer: s[12,15] type 1

                  vgpr registers: 4 -> 10

                  sgpr registers: 19 -> 25

                  lds aux mvalue: 001009A0 -> 00000000

                  gds max memory: 8000 -> 0

                  lds max memory: 2000 -> 8000

                  ------------------------------------------

                    work done, now start reassembly, report what goes where

                  ------------------------------------------

                  wr pgr3  start   8653[  8653] size   1444[  1444]

                  wr pgr4  start  10097[ 10097] size   4981[  4929]

                  wr sec0  start      0[     0] size      0[     0]

                  wr sec1  start    252[   252] size     40[    40]

                  wr sec2  start   1224[  1224] size   2592[  2592]

                  wr sec3  start   3816[  3816] size   4736[  4736]

                  symtab(4) at 8552

                  =strtab(5) size     21

                  symbol(0) ndx= 0 size=    0 info=0 val=    0 []

                  symbol(1) ndx=16 size=    0 info=0 val=    0 [uav9]

                  symbol(2) ndx=16 size=    0 info=0 val=    1 [uav10]

                  symbol(3) ndx=10 size=    0 info=0 val=    0 [cb1]

                  symbol(4) ndx=10 size=    0 info=0 val=    1 [cb0]

                  wr sec4  start   8552[  8552] size     80[    80]

                  wr sec5  start   8632[  8632] size     21[    21]

                  save data section(6)

                  wr sec6  start  10097[ 10097] size    144[    92]

                  wr sec7  start  10241[ 10189] size   4736[  4736]

                  symtab(8) at 14925

                  =strtab(9) size     21

                  symbol(0) ndx= 0 size=    0 info=0 val=    0 []

                  symbol(1) ndx=16 size=    0 info=0 val=    0 [uav9]

                  symbol(2) ndx=16 size=    0 info=0 val=    1 [uav10]

                  symbol(3) ndx=10 size=    0 info=0 val=    0 [cb1]

                  symbol(4) ndx=10 size=    0 info=0 val=    1 [cb0]

                  wr sec8  start  14977[ 14925] size     80[    80]

                  wr sec9  start  15057[ 15005] size     21[    21]

                  wr pent 0 at    212 offset      0                             (pent means elf  "program entry")

                  wr pent 1 at    692 offset      0                           

                  wr pent 2 at   1224 offset      0

                  wr pent 3 at   8653 offset      0

                  wr pent 4 at  10097 offset      0

                  wr sent 0 at      0 offset      0                            (sent means elf section entry)

                  wr sent 1 at    252 offset      0

                  wr sent 2 at   1224 offset      0

                  wr sent 3 at   3816 offset      0

                  wr sent 4 at   8552 offset      0

                  wr sent 5 at   8632 offset      0

                  wr sent 6 at  10097 offset      0

                  wr sent 7 at  10241 offset      0

                  wr sent 8 at  14977 offset      0

                  wr sent 9 at  15057 offset      0

                  ----------------------------------------------

                  inner elf now repacked, need to resize outer elf

                  ----------------------------------------------

                  p1text1: newsize=15026 oldsize=15078

                  SEC(7) move phtab by 52

                  SEC(7) move shtab by 52

                  EXPAND section 5 by 52

                  MOVE section 6 by 52

                  wr sec0  start      0[     0] size      0[     0]

                  wr sec1  start     52[    52] size     50[    50]

                  wr sec2  start    102[   102] size     66[    66]

                  symtab(3) at 168

                  =strtab(2) size     66

                  symbol(0) ndx= 0 size=    0 info=0 val=    0 []

                  symbol(1) ndx= 4 size=  359 info=1 val=    0 [__OpenCL_test_metadata]

                  symbol(2) ndx= 5 size=15026 info=2 val=    0 [__OpenCL_test_kernel]

                  found _kernel  symbol, section 5 size 15026 --> 15078

                  symbol(3) ndx= 4 size=   32 info=1 val=  359 [__OpenCL_test_header]

                  wr sec3  start    168[   168] size     64[    64]

                  rodata:lds memory: 2000 -> 8000

                  wr sec4  start    232[   232] size    391[   391]

                  save data section(5)

                  wr sec5  start    623[   623] size  15078[ 15026]

                  wr sec6  start  15701[ 15649] size     67[    67]

                  wr sent 0 at      0 offset      0

                  wr sent 1 at     52 offset      0

                  wr sent 2 at    102 offset      0

                  wr sent 3 at    168 offset      0

                  wr sent 4 at    232 offset      0

                  wr sent 5 at    623 offset      0

                  wr sent 6 at  15701 offset      0

                  ==========================================================

                  /

                  1 of 1 people found this helpful
                    • Re: Raw ISA bytecode location
                      relpats_eht

                      Thanks for the additional information. What tools did you use to generate that output? It still seems rather odd to me for bytecode to have so many 0's in it. I should probably review the ISA spec some to see why.

                      • Re: Raw ISA bytecode location
                        jg1

                        My research happens to require reloading of ISA microcode into a opencl ELF. The information provided by drallan's reply , with outputs of his reloader, is very useful to me. In particular, I need to know if a new binary (the ISA code in the second .text section of the nested ELF) is injected into an existing OpenCL ELF, what are other parts of the ELF file that need to be changed, besides the length adjustments of various sections. For example, the output of drallan's reloader include the following lines:

                         

                        (WOW)p34sum 6373 --> 6425

                        wr pgr0  start    212[   212] size     40[    40]

                        wr pgr1  start    692[   692] size 532[   532]

                        wr pgr2  start   1224[  1224] size   7429[ 7429]

                        user elements: 3

                        uav memr buffer: s[4,7] type 10

                        constant buffer: s[8,11] type 0

                        argument buffer: s[12,15] type 1

                        vgpr registers: 4 -> 10

                        sgpr registers: 19 -> 25

                        lds aux mvalue: 001009A0 -> 00000000

                        gds max memory: 8000 -> 0

                        lds max memory: 2000 -> 8000

                         

                        This fragment seems to suggest that some kernel program information, e.g., "uav memr buffer" and "vgpr registers", presumably stored in the NOTE segment, must be modified. The only documentation I can find is AMD's CAL Programming Guide, which gives the structure of the records in the NOTE segment but it does not explain how these program information are stored. Could AMD support staff or drallan shed some light on this?

                         

                        Thank you.

                          • Re: Raw ISA bytecode location
                            realhet

                            Can you please give me a link for that  CAL Programming Guide? I can't find it on developer.amd.com o.0

                             

                            Things in the note section I know about:

                            - user elements: initial configuration for resource constants.  In the example there are 2 resources: uav, constbuf, argument buf. For example: In the ISA code you can access the UAV with the registers s[4:7]. Resource Constants are similar to memory range descriptors in an x86 system. (More info in -> GCN ISA manual)

                            You basically don't have to modify these, just use it in the form as OpenCL compiled it. The only exception is the size information at a constant buffer declaration.

                             

                            "which gives the structure of the records in the NOTE segment but it does not explain how these program information are stored." You mean, how the standard ELF note section is stored? -> Here it is http://www.skyfree.org/linux/references/ELF_Format.pdf  Search for "Note Section" in it.

                             

                            Is there a note key list in that "CAL Programming Guide" you mentioned? I have some keys that I had hacked out of the cal image, but if there's a complete list, I'd be glad to see it.

                              • Re: Raw ISA bytecode location
                                jg1

                                Hi realhet,

                                 

                                I could not remember where I found the document. If you let me know how I can send you a copy, I would be happy to do so.

                                 

                                Thank you for your help. Sorry I did not explain very clearly what I need in my first message. I have read the ELF format document, the GCN architecture ppt, and the SI-GPU ISA document. The ELF format pdf specifies the general ELF format. The Appendix B of the CAL Programming Guide specifies how the ELF format is used in the CAL context, that includes, for examples, why there are two NOTE and LOAD segments (thus two .text sections) and what information is stored in the NOTE segment. However, it does not provide any information about where the kernel program info (like # of vgprs, sgprs, lds memory, etc.) are stored. It might be explained in drallan's attachment -- I will verify this in my program.

                                • Re: Raw ISA bytecode location
                                  jg1

                                  Hi realhet,

                                   

                                  To answer the question in your message: Yes, the document includes the definition of the CALNoteHeader struct,

                                  typedef struct {

                                  Elf32_Word namesz; /* size of the name field. Must be 8 */

                                  Elf32_Word descsz; /* size of the data payload */

                                  Elf32_Word type; /* type of the payload */

                                  char name[8]; /* note header string. Must be “ATI CAL” */

                                  } CALNoteHeader;

                                   

                                  and Type Identifiers:

                                  Note Type Description

                                  Inputs ELF_NOTE_ATI_INPUTS  Inputs used by the program.

                                  Outputs ELF_NOTE_ATI_OUTPUTS  Outputs written by the program.

                                  FloatConsts ELF_NOTE_ATI_FLOATCONSTS  Float constant data segment desc

                                  IntConsts ELF_NOTE_ATI_INTCONSTS  Integer constant data segment desc

                                  BoolConsts ELF_NOTE_ATI_BOOLCONSTS  Boolean constant data segment d

                                  EarlyExit ELF_NOTE_ATI_EARLYEXIT  Program termination description.

                                  GlobalBuffers ELF_NOTE_ATI_GLOBAL_BUFFERS  Global import/export buffer.

                                  ConstantBuffers ELF_NOTE_ATI_CONSTANT_BUFFERS  Constant buffer usage mask.

                                  SamplerMap ELF_NOTE_ATI_INPUT_SAMPLER_MAP  Input to sampler binding table.

                                  ScratchBuffer ELF_NOTE_ATI_SCRATCH_BUFFER  Scratch memory usage mask.

                                  PersistentBuffer ELF_NOTE_ATI_PERSISTANT_BUFFER  Persistent memory usage mask.

                                  ProgramInfo ELF_NOTE_ATI_PROGRAM_INFO  Device configuration table.

                                • Re: Raw ISA bytecode location
                                  drallan

                                  Hi jg1,

                                   

                                  Note that the inner elf has about 5 'program' sections 3 of which overlap some of the normal text sections,

                                  this is helpful when calculating the new section offsets and sizes. The program sections have their own

                                  program headers instead of the regular section headers.

                                   

                                  1. Replace the gcn binary code in Inner file: second text section.

                                  2. Calculate and change all required sizes in section headers and program headers.

                                  3. Change a few parameters in the inner file 4th (index 3)  'program' section.

                                    (this includes things like the no. of SGPRs, VGPRs, lds  and gds memory requirements)

                                  After packing the inner elf

                                  4. Change appropriate section sizes in outer elf section headers.

                                  5. In the outer elf ".rodata" section change the lds memory maximum if needed (see attached notes/code snippets)

                                  6. In the outer symbol table entries, change the size of the section with the kernel name (need to look at strtab for the name,

                                  again there is a code snippet in the attached notes.)

                                  Then, pack that thing!

                                   

                                  There is more info in the notes that I couldn't paste here because this editor keeps inserting everything into tables.

                                  The notes also mention how to change the parameters in the 4th program section in the inner file. Basically you look

                                  for a 32 bit key code first and then read/write the data in the following int. There is a table of important key codes in

                                  the notes.

                                   

                                  Note, some of the data output from my program does not need to be changed, i.e., the uav memr buffer register assignments and type.

                                  These are just for reference and/or can be used for programming at a higher level, like a compiler.

                                   

                                  Good luck, it's really not that difficult, only made a little complicated by the elf format.

                                  Sorry for the jumbled notes.

                                   

                                  drallan

                                    • Re: Raw ISA bytecode location
                                      jg1

                                      Hi drallan,

                                       

                                      Thank you for your reply. It gives quite a lot of information. I think the linkdat struct is what I am looking for. I will do some tests and let you know the results. Thanks a lot.

                                       

                                      P.S. By the way, where in AMD's documents can I find information of the linkdat struct?

                                        • Re: Raw ISA bytecode location
                                          drallan

                                          jg1 wrote:

                                           

                                          Hi drallan,

                                           

                                          Thank you for your reply. It gives quite a lot of information. I think the linkdat struct is what I am looking for. I will do some tests and let you know the results. Thanks a lot.

                                           

                                          P.S. By the way, where in AMD's documents can I find information of the linkdat struct?

                                          Hi jg1,

                                           

                                          The linkdat structure is from my elf repacker and is a partial list of entries that I have found useful. I don't know of any AMD document that lists their meaning and values. Some I have dug out and I am looking to see if I have some additional info, I think I must.

                                           

                                          The 5 elf program sections that I mentioned are what the CAL programming manual refers to as the PT_NOTE segment. The CAL programming guide describes the basic structure of the inner elf but omits some useful information. The 2 'program' sections that overlap the elf sections contain the executable and load data. Usually you only need to fix the executable section.

                                           

                                          The other 3 program sections (0, 1, 3), contain information about the program and set up. section 3 is the one with the important information.

                                          One of the CAL entries (stuff with the ATI_CAL string) contains a large block of info that contains the information fields found in linkdat. In a sense, they are below, or contained inside, the CAL entry format description in the programmers manual.

                                           

                                          To help visualize this, I have attached dumps of the 3 'program' sections, and also the ".rodata" section. Most of the rodata is ascii and is a copy of info you would see from the Kernel analyzer output. There is a small binary section after the ascii part where I change the lds memory allocation. You may not need that.

                                           

                                          If I find any more good info I will post it here.

                                           

                                          drallan

                                            • Re: Raw ISA bytecode location
                                              jg1

                                              Hi drallan,

                                               

                                              Thank you again for your kind help. I added your inputs to my readelf program and attached the outputs here. Next I am going to do the repacking. I will post my findings here.

                                               

                                              The linkdat structure looks like the "device configuration table" mentioned in the CAL Programming Guide. The declaration of the CALProgramInfoEntry is as follows:

                                              typedef struct {

                                                   Elf32_Word address; /* device address */

                                                   Elf32_Word value; /* value */

                                              } CALProgramInfoEntry;

                                               

                                              It is probably used by the driver to program the GPU chips through device I/O ports.

                                               

                                              jg1

                                                • Re: Raw ISA bytecode location
                                                  drallan

                                                  The linkdat structure looks like the "device configuration table" mentioned in the CAL Programming Guide.  [...]

                                                  typedef struct {

                                                       Elf32_Word address; /* device address */

                                                       Elf32_Word value; /* value */

                                                  } CALProgramInfoEntry;

                                                   

                                                  It is probably used by the driver to program the GPU chips through device I/O ports.

                                                   

                                                  Yes, I think that must be correct. The "device address" also looks like a PCI bus IO transaction.

                                                  I checked, the entries in linkdat were all dug from code, I don't have any other docs on the device configuration table values.

                                                   

                                                  drallan

                                                • Re: Raw ISA bytecode location
                                                  jg1

                                                  Hi drallan and realhet,

                                                   

                                                  My repacker is now working. Thank you for all your help. I really appreciate it.

                                                   

                                                  For the purpose of repacking ISA code (with the same kernel function signature), drallan's instructions are good enough.

                                                   

                                                  I also find good information in this file. Look for struct si_bin_enc_dict_entry_t, and enum _E_SC_USER_DATA_CLASS. You might want to look at other files at the web site to see how the dots are connected. I modified my readelf program again and its output is in the attachment.

                                                   

                                                  I still have not found information about the use of GDS_MAX, LDS_MAX, and LDS_AUX. But they do not seem to matter with the level of complexity of my current kernel. I will post here if I find anything.

                                                   

                                                  jg1

                                                    • Re: Raw ISA bytecode location
                                                      realhet

                                                      Wow! Nice find -> multi2sim

                                                      http://www.multi2sim.org/svn/multi2sim/trunk/src/arch/southern-islands/emu/machine.c

                                                      This seems to be an accurate 'documentation on every GCN instructions

                                                      Also there's a complete* inverse assembler -> http://www.multi2sim.org/svn/multi2sim/trunk/src/arch/southern-islands/asm/asm.c

                                                       

                                                      I think don't you  wanna understand all the values in the .elf, only patch that few you really use and leave the rest for the ocl compiler.

                                                       

                                                      GDS_MAX, LDS_MAX, and LDS_AUX: That's new to me too: I only tried LDS on CAL, but there the only thing I had to change was the control word marked with PGM2_COUMPUTE_RESOURCES in the disasm. If I recall bits8:15 (or something like this) represented the LDS allocation size. But seems like OCL stores it redundantly.

                                                       

                                                      "as10.txt.zip" <- Well done!

                                                       

                                                      Edit: Oh noes, multi2sim is full of inumplemented instructions

                                                        • Re: Raw ISA bytecode location
                                                          jg1

                                                          Hi realhet,

                                                           

                                                          You are right. Multi2sim can be a good reference when we need to find out execution details of an ISA instructions. I have not studied yet but it is good to know where to look to when I need.

                                                           

                                                          Thanks for the info of LDS. They are bits 8:16 in COMPUTE_PGM_RSRC2. So you are only a tiny "bit" off.

                                                        • Re: Raw ISA bytecode location
                                                          drallan

                                                          jg1 wrote:

                                                           

                                                          Hi drallan and realhet,

                                                           

                                                          My repacker is now working. Thank you for all your help. I really appreciate it.

                                                           

                                                          For the purpose of repacking ISA code (with the same kernel function signature), drallan's instructions are good enough.

                                                           

                                                          I also find good information in this file. Look for struct si_bin_enc_dict_entry_t, and enum _E_SC_USER_DATA_CLASS. You might want to look at other files at the web site to see how the dots are connected. I modified my readelf program again and its output is in the attachment.

                                                           

                                                          I still have not found information about the use of GDS_MAX, LDS_MAX, and LDS_AUX. But they do not seem to matter with the level of complexity of my current kernel. I will post here if I find anything.

                                                           

                                                          That was fast! Congratulations.

                                                           

                                                          I have the multi2sim files. They are very good for gcn code but they don't have the device configuration table info (Where is it!?)

                                                          The _E_SC_USER_DATA_CLASS looks real interesting but the numbers don't seem to correlate with anything. Maybe specific to multi2sim?

                                                          The Linux Mesa/Gallium project also has some (too many ! ) files but they are a bit less clear.

                                                          You can google mesa southern islands.

                                                           

                                                          I would be very interested if you or realhet find any more good info, and happy to share anything I find too...

                                                           

                                                          drallan

                                                            • Re: Raw ISA bytecode location
                                                              jg1

                                                              Thanks, drallan.

                                                               

                                                              Re: the device configuration table and data class enum:

                                                              in asm.h, look for this:

                                                              /* Encoding dictionary entry */

                                                              struct si_bin_enc_dict_entry_t

                                                              {

                                                              ...

                                                                        unsigned int userElementCount;

                                                                        struct si_bin_enc_user_element_t userElements[16];

                                                              ...

                                                              };

                                                              I think the name "si_binenc_dict_entry_t" is incorrect. For my understanding, this structure is actually the device configuration table. It is one of the Note Headers in the second Note segment. The following lines are taken from my readelf program output (see my last post):

                                                              CAL Note Segment:

                                                              TypeField                 Type PayloadSize

                                                              0x00000002                (TBD)        0x0

                                                              0x00000003                (TBD)        0x0

                                                              0x00000010                (TBD)       0x40

                                                              0x00000004                (TBD)        0x4

                                                              0x00000005                (TBD)        0x0

                                                              0x00000006                (TBD)        0x0

                                                              0x00000007                (TBD)        0x0

                                                              0x00000008                (TBD)        0x4

                                                              0x00000009                (TBD)        0x0

                                                              0x0000000a                (TBD)       0x10

                                                              0x0000000b                (TBD)        0x0

                                                              0x0000000d                (TBD)        0x4

                                                              0x0000000c                (TBD)        0x0

                                                              0x00000001                (TBD)      0x390    <----- Device config table

                                                              0x0000000e                (TBD)        0x0

                                                              0x0000000f                (TBD)        0x4

                                                              0x00000011                (TBD)       0x80

                                                               

                                                              The payload of the header is a list of CALProgramInfoEntry structs

                                                              typedef struct {

                                                                   Elf32_Word address; /* device address */

                                                                   Elf32_Word value; /* value */

                                                              } CALProgramInfoEntry;

                                                               

                                                              and they look like this:

                                                               

                                                              Device configuration table (Program Information):

                                                              TypeField Type                      Value

                                                              0x80001000 USER_ELEMENTS               0x3

                                                              0x80001001 UE0_DataClass              0x17

                                                              0x80001002 UE0_APISlot                 0x0

                                                              0x80001003 UE0_SGPRStart               0x2

                                                              0x80001004 UE0_SGPRCount               0x2

                                                              0x80001005 UE1_DataClass               0x2

                                                              0x80001006 UE1_APISlot                 0x0

                                                              0x80001007 UE1_SGPRStart               0x4

                                                              0x80001008 UE1_SGPRCount               0x4

                                                              0x80001009 UE2_DataClass               0x2

                                                              0x8000100a UE2_APISlot                 0x1

                                                              0x8000100b UE2_SGPRStart               0x8

                                                              ...

                                                              The first item is the user elements count, following by 16 user element structs defined as:

                                                              struct si_bin_enc_user_element_t

                                                              {

                                                                        unsigned int dataClass;

                                                                        unsigned int apiSlot;

                                                                        unsigned int startUserReg;

                                                                        unsigned int userRegCount;

                                                              };

                                                               

                                                              The first field is the data class. Its value is defined by _E_SC_USER_DATA_CLASS. In the above example, the first user element, UE0, has a data class value of 0x17, which is a PTR_UAV_TABLE. Likewise, UE1 and UE2 are of IMM_CONST_BUFFER data class.

                                                               

                                                              I will look at Mesa/Gallium, thanks. I am happy to share my findings, too.

                                                                • Re: Raw ISA bytecode location
                                                                  realhet

                                                                  Hey,

                                                                  Do you wanna make your ISA wrapper to be able to define kernel parameters as well?

                                                                  I think it gonna be really usefull, but where to get all the information for it from

                                                                   

                                                                  I'm thinking it more simpler: Don't wanna understand how those 16 elements works and also understand HOW OPENCL PASSES BUFFER ptrs. I'll need to use them from the asm side also. And what if AMD changes that structure (the can rewrite it, and generate different amd_il code, you know)?!

                                                                   

                                                                  I'm thinking about to compile skeleton opencl code and then patch it automatically.

                                                                   

                                                                  Anyways, now I try to make a 99% OpenCl compatible patcher which works without LDS/GDS. And after I have some question marks about LDS/GDS.

                                                                    • Re: Raw ISA bytecode location
                                                                      jg1

                                                                      realhet wrote:

                                                                       

                                                                      Hey,

                                                                      Do you wanna make your ISA wrapper to be able to define kernel parameters as well?

                                                                      I think it gonna be really usefull, but where to get all the information for it from

                                                                       

                                                                      I'm thinking it more simpler: Don't wanna understand how those 16 elements works and also understand HOW OPENCL PASSES BUFFER ptrs. I'll need to use them from the asm side also. And what if AMD changes that structure (the can rewrite it, and generate different amd_il code, you know)?!

                                                                       

                                                                      I'm thinking about to compile skeleton opencl code and then patch it automatically.

                                                                       

                                                                      Anyways, now I try to make a 99% OpenCl compatible patcher which works without LDS/GDS. And after I have some question marks about LDS/GDS.

                                                                      Yes, I am interested in a more complete solution. I do not need it at this moment but knowing the use of various bits gives me more confidence that any hacking/patching will work in different conditions. I will look into it when I have more time.

                                                                       

                                                                      jg1

                                                                    • Re: Raw ISA bytecode location
                                                                      realhet

                                                                      Something's not round with gds/lds:

                                                                       

                                                                      drallan:

                                                                      {0x80000081,"GDS_MAX  "},       //???

                                                                      {0x80000082,"LDS_MAX  "},       //also  80001045 see below

                                                                      {0x80001045,"LDS_AUX  "}        //also  80001045 see below

                                                                      + .rodata

                                                                       

                                                                      multi2sim:

                                                                      { "AMU_ABI_LDS_SIZE_AVAIL",          0x80000081 },

                                                                      { "AMU_ABI_LDS_SIZE_USED",          0x80000082 },

                                                                       

                                                                      I say:

                                                                      key 0x80000081 is always $80000, whatever I do o.o

                                                                      key Computr_pgm_rsc2 bits[16:24] is the actual LDS size

                                                                       

                                                                      What's the truth?   On CAL my version works fine, I tested it. Doing some lds size comparison soon.

                                                                       

                                                                      Other stuff:

                                                                      There is the 0th program entry in the cal.elf. It's type in $70000002 and inside there are offset/size pairs referencing to the amd_il and isa sections offseted with a misterious offset. Do you know this? I can shift it anyways according to the new isa size. Maybe it's an offset relative to the OpenCl.elf image...

                                                                       

                                                                        • Re: Raw ISA bytecode location
                                                                          jg1

                                                                          realhet wrote:

                                                                           

                                                                          Other stuff:

                                                                          There is the 0th program entry in the cal.elf. It's type in $70000002 and inside there are offset/size pairs referencing to the amd_il and isa sections offseted with a misterious offset. Do you know this? I can shift it anyways according to the new isa size. Maybe it's an offset relative to the OpenCl.elf image...

                                                                           

                                                                           

                                                                          I can take this question. Here is the how this LOPROC+2 Program header looks (in as10.txt) (the Program header contains what is called Encoding Dictionary in the CAL Programming Guide)

                                                                          Encoding Dictionary:

                                                                                Type     Offset       Size      Flags

                                                                          0x00000004      0x2b4    0x8e3dd        0x0

                                                                          0x00000004    0x8e691     0xbc59        0x0

                                                                           

                                                                          and the Program Headers

                                                                          Program Headers:

                                                                               Type      Offset      VAddr      PAddr    FileSiz     MemSiz

                                                                            LOPROC+2       0xd4        0x0        0x0       0x28        0x0

                                                                                NOTE      0x2b4        0x0        0x0      0x234        0x0

                                                                                LOAD      0x4e8        0x0        0x0    0x8e1a9    0x8e1a9

                                                                                NOTE    0x8e691        0x0        0x0      0x5c4        0x0

                                                                                LOAD    0x8ec55        0x0        0x0     0xb695     0xb695

                                                                           

                                                                          So the "mysterious offset" is really the offset of the 1st Note segment. And the Size (0x8e3dd) is the sum of the 1st Note+Load segments, Is this what you are looking for?

                                                                           

                                                                          jg1


                                                                            • Re: Raw ISA bytecode location
                                                                              realhet

                                                                              I see now, thanks. (I was dumb enough not to search in the program header lol) Now I can regenerate that part from actual program header information.

                                                                               

                                                                              Then I installed 12.8 driver (11.12 was a bit old for lds/gds tests hehe)

                                                                               

                                                                              LDS:

                                                                              I've find out how  works (mostly like drallan said):

                                                                              - ocl.elf.rodata -> see last 8 dwords, dword[3] is the LDS size in bytes (also specified in the ascii dump in the form "hwlocal:n" but let's hope opencl runtime doesn't rely on that)

                                                                              - cal.elf notes key $00002e13 bits[15:23] (not [16:24] as in multi2sim) specified in 256byte units

                                                                              - cal.elf notes key $80000082 in 1byte units

                                                                               

                                                                              GDS:

                                                                              I've tried it using CAL: I allocated 256bytes of gds, and also used it for an atomic addition. Then I changed the size to 512 bytes, and the only thing changed in cal.elf was that byte 256->512 inside the amd_il code.

                                                                              So I think we can use GDS under OpenCL as far as no two kernels uses the same parts of the whole GDS.

                                                                               

                                                                              Unimportant parts:

                                                                              - amd_il .text section

                                                                              - both .data sections filled with 4KB zeroes

                                                                              Although the ascii part of .rodata is important for the disassembler and also for the opencl runtime. I guess the amd_il note part is also unimportant, but I was lazy to write code to replace a program entry.

                                                                              The ocl elf size reduced from 13.5KB to 3.5KB, and still works.

                                                                               

                                                                              Attached a 'pseudo' code on how to do the patching:

                                                                                • Re: Raw ISA bytecode location
                                                                                  jg1

                                                                                  Good job on the LDS/GDS!

                                                                                  Although the ascii part of .rodata is important for the disassembler and also for the opencl runtime. I guess the amd_il note part is also unimportant, but I was lazy to write code to replace a program entry.

                                                                                  The ocl elf size reduced from 13.5KB to 3.5KB, and still works.

                                                                                  I believe that the 1st (amdil) Note segment is not used (in the CAL image, or nested ELF), except there is no ISA code present, in which case, the driver will have to compile the IL into ISA.

                                                                                   

                                                                                  jg1

                                                                                  • Re: Raw ISA bytecode location
                                                                                    jg1

                                                                                    hi realhet,

                                                                                     

                                                                                    Just curious what script language are you using in your attachment?

                                                                                    One question:

                                                                                    SetCalNote($00002e13,(opts.ldsSizeBytes+255)shr 8 shl 15,$FFF07FFF{and mask});

                                                                                    maybe it is a typo of {or mask}?

                                                                                    I verified this with my kernel. the LDS size is indeed in bits [15:23].

                                                                                     

                                                                                    jg1

                                                                                    • Re: Raw ISA bytecode location
                                                                                      drallan

                                                                                      LDS:

                                                                                      I've find out how  works (mostly like drallan said):

                                                                                      - ocl.elf.rodata -> see last 8 dwords, dword[3] is the LDS size in bytes (also specified in the ascii dump in the form "hwlocal:n" but let's hope opencl runtime doesn't rely on that)

                                                                                       

                                                                                      I ran some experiments to get to the bottom of the confusing rodata question. Surprisingly, its the master setting for the kernel's lds memory. It always works and will override settings in compute_pgm_rsrc2. When the rodata setting is  0, compute_pgm_rsrc2 is used. Compute_pgm_rsrc2 may not work when an ocl skeleton kernel uses lds memory because this sets rodata. If both rodata and compute_pgm_rsrc2 settings are used but have different values, rodata defines the amount of lds memory.

                                                                                       

                                                                                      I compiled a small ocl kernel and replaced it with a gcn isa program. The isa program reads and writes a series of 256 byte blocks of lds. The ocl kernel was compiled with and without any lds memory then the lds memory settings were changed in different locations before running the isa code. The isa program shows which parts of the lds memory were active.

                                                                                       

                                                                                      Output during setup.

                                                                                                   +--------------------lds setting from ocl kernel

                                                                                                   |      +-------------lds setting at repack

                                                                                                   |      |       +-----location of setting

                                                                                      lds memory: 100 -> 100 (compute_pgm_rsrc2)

                                                                                       

                                                                                      Results: OCL kernel defines some lds memory   (note 77777777 means LDS is active)

                                                                                       

                                                                                      lds memory: 100 -> 100 (compute_pgm_rsrc2)

                                                                                      lds memory: 100 -> 400 (abi_lds_size_used)

                                                                                      lds memory: 100 -> 100 (.rodata kernel header)

                                                                                      77777777 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ( kernel output)

                                                                                      0x000       0x100      0x200       0x300      0x400     0x500      0x600      0x700  (lds address)

                                                                                       

                                                                                      lds memory: 100 -> 400 (compute_pgm_rsrc2)

                                                                                      lds memory: 100 -> 100 (abi_lds_size_used)

                                                                                      lds memory: 100 -> 100 (.rodata kernel header)

                                                                                      77777777 00000000 00000000 00000000 00000000 00000000 00000000 00000000

                                                                                       

                                                                                      lds memory: 100 -> 100 (compute_pgm_rsrc2)

                                                                                      lds memory: 100 -> 100 (abi_lds_size_used)

                                                                                      lds memory: 100 -> 400 (.rodata kernel header)

                                                                                      77777777 77777777 77777777 77777777 00000000 00000000 00000000 00000000

                                                                                       

                                                                                      OCL kernel defines no lds memory

                                                                                       

                                                                                      lds memory: 0 -> 400 (compute_pgm_rsrc2)

                                                                                      lds memory: 0 -> 0 (abi_lds_size_used)

                                                                                      lds memory: 0 -> 0 (.rodata kernel header)

                                                                                      77777777 77777777 77777777 77777777 00000000 00000000 00000000 00000000

                                                                                       

                                                                                      lds memory: 0 -> 0 (compute_pgm_rsrc2)

                                                                                      lds memory: 0 -> 400 (abi_lds_size_used)

                                                                                      lds memory: 0 -> 0 (.rodata kernel header)

                                                                                      00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

                                                                                       

                                                                                      lds memory: 0 -> 0 (compute_pgm_rsrc2)

                                                                                      lds memory: 0 -> 0 (abi_lds_size_used)

                                                                                      lds memory: 0 -> 400 (.rodata kernel header)

                                                                                      77777777 77777777 77777777 77777777 00000000 00000000 00000000 00000000

                                                                                       

                                                                                      lds memory: 100 -> 400 (compute_pgm_rsrc2) rodata overrides compute_pgm_rsrc2

                                                                                      lds memory: 100 -> 100 (abi_lds_size_used)

                                                                                      lds memory: 100 -> 200 (.rodata kernel header)

                                                                                      77777777 77777777 00000000 00000000 00000000 00000000 00000000 00000000

                                                                                  • Re: Raw ISA bytecode location
                                                                                    drallan

                                                                                    Something's not round with gds/lds:

                                                                                     

                                                                                    drallan:

                                                                                    {0x80000081,"GDS_MAX  "},       //???

                                                                                    {0x80000082,"LDS_MAX  "},       //also  80001045 see below

                                                                                    {0x80001045,"LDS_AUX  "}        //also  80001045 see below

                                                                                    + .rodata

                                                                                    multi2sim:

                                                                                    { "AMU_ABI_LDS_SIZE_AVAIL",          0x80000081 },

                                                                                    { "AMU_ABI_LDS_SIZE_USED",          0x80000082 },

                                                                                     

                                                                                    multi2sim is more round!

                                                                                    0x80000082 LDS_SIZE_USED is the same as LDS_MAX (max used by a workgroup)

                                                                                    0x80000081 is LDS_SIZE_AVAIL (LDS hardware limit, so it affects nothing)

                                                                                    So GDS_MAX is wrong and may not exist because GDS is global and seen by all CUs and workgroups.

                                                                                    At least the question marks were right !

                                                                                     

                                                                                    .rodata Under some conditions  LDS values had problems.  The value of LDS_USED also appears in the 8 integer block at the end of the .rodata section just after the text part. After changing both locations, the problems disappeared.

                                                                                     

                                                                                    drallan

                                                                                      • Re: Raw ISA bytecode location
                                                                                        jg1

                                                                                        There is one more problem I just discovered but have not investigated:

                                                                                        I created an OpenCL kernel source file which contains two kernel functions (KFs). The first KF uses small amount of resources (registers, LDS, etc.) and the second KF uses a lot more. I ran KSA2 and exported the binary. The device configuration in the resulting ELF only shows the resources used by the first KF.

                                                                                         

                                                                                        The question is if I decide to run the second KF, where does the driver find the resource (device config.) needed by the 2nd KF?

                                                                                         

                                                                                        jg1

                                                                                          • Re: Raw ISA bytecode location
                                                                                            drallan

                                                                                            There is one more problem I just discovered but have not investigated:

                                                                                            I created an OpenCL kernel source file which contains two kernel functions (KFs). The first KF uses small amount of resources (registers, LDS, etc.) and the second KF uses a lot more. I ran KSA2 and exported the binary. The device configuration in the resulting ELF only shows the resources used by the first KF.

                                                                                             

                                                                                            The question is if I decide to run the second KF, where does the driver find the resource (device config.) needed by the 2nd KF?

                                                                                             

                                                                                            When you have 2 kernels, the structure of the inner and outer elf is the same as for 1 kernel and data for the 2 kernels is combined within the outer elf sections. The outer elf has 4 sections of interest. text, rodata, symtab, and strtab. The data is combined like this.

                                                                                            text = text1 + text2

                                                                                            rodata = rodata1 + rodata2

                                                                                            When you call the elf re-packer, it only sees text1 and rodata1 which come first in the sections.

                                                                                            To see the second kernel, you need to look at the symtab of the outer elf which has symbols for both kernels.

                                                                                            Something like:

                                                                                                   name           sec. index    "value"

                                                                                            __OpenCL_KF1_metadata      4         0

                                                                                            __OpenCL_KF1_kernel        5         0

                                                                                            __OpenCL_KF1_header        4         356

                                                                                            __OpenCL_KF2_metadata      4         388

                                                                                            __OpenCL_KF3_kernel        5         14912

                                                                                            __OpenCL_KF4_header        4         744

                                                                                             

                                                                                            The elf symbol structure member "value" contains the offset within the sections (4=rodata, 5=text).

                                                                                            Use these offsets to determine what part of the sections you give to the inner elf processor.

                                                                                            Since you only run one kernel at a time, you only change and repack that kernel, so size calculations

                                                                                            can be done the same way as before.

                                                                                             

                                                                                            drallan


                                                                                             

                                                                                            Message was edited by: Robert Allan Barker

                                                                                              • Re: Raw ISA bytecode location
                                                                                                jg1

                                                                                                Excellent! I cannot find this in the AMD docs I have. Could you point me where this is documented?

                                                                                                 

                                                                                                jg1

                                                                                                  • Re: Raw ISA bytecode location
                                                                                                    drallan

                                                                                                    jg1 wrote:

                                                                                                     

                                                                                                    Excellent! I cannot find this in the AMD docs I have. Could you point me where this is documented?

                                                                                                     

                                                                                                    jg1

                                                                                                    I don't see this in any documents either but it's easy to see. Duplicate a kernel in a file and rename it so you have 2 identical kernels and compile. The elf dump section sizes are now exactly double but nothing else changes .

                                                                                                    Only the symtab now refers to both kernels with offsets for kernel 2 into the second half of each section.

                                                                                          • Re: Raw ISA bytecode location
                                                                                            drallan

                                                                                            Re: the device configuration table and data class enum:

                                                                                            in asm.h, look for this:

                                                                                            /* Encoding dictionary entry */

                                                                                            struct si_bin_enc_dict_entry_t

                                                                                            {

                                                                                                      unsigned int userElementCount;

                                                                                                      struct si_bin_enc_user_element_t userElements[16];

                                                                                            };

                                                                                            I think the name "si_binenc_dict_entry_t" is incorrect............[more]............

                                                                                            That all looks good, the dots are connecting fast. Most things in multi2sim now fit together.

                                                                                             

                                                                                            The other part is the buffers (referred to by user elements) used by the program. Most basic programs use 3,

                                                                                            1. UAV_TABLE or UAV_TABLE_PTR - has the 128 bit buffer descriptor(s) used by buffer_read/write insns.

                                                                                            2. IMM_CONST_BUFFER used for the NDrange information and other program constants

                                                                                            3. IMM_CONST_BUFFER containing the kernel arguments

                                                                                            The UAV_TABLE entries must be described by the struct si_buffer_resource_t.

                                                                                             

                                                                                            What's interesting is the reference (in m2sim) to "Table 8.5 of the SI documentation" that describes the buffer resource. That's been in multi2sim for a long time, so there must be one more document somewhere.

                                                                            • Re: Raw ISA bytecode location
                                                                              realhet

                                                                              So I checked hogy kernel function parameters are passed, and that turned out that it's quiet overcomplicated, too hard to  generate it by hand 

                                                                               

                                                                              A (__global uint *a) -> (__global uint *a, __global uint *b) changes almost everything, except the 4KB zero sections. Here's a list:

                                                                               

                                                                              ;  userElements[0]    = IMM_UAV, 10, s[4:7]   //a

                                                                              ;  userElements[1]    = IMM_CONST_BUFFER, 0, s[8:11]  //a offset

                                                                              ;  userElements[2]    = IMM_CONST_BUFFER, 1, s[12:15]  //kernel range parameters

                                                                              became

                                                                              ;  userElements[0]    = PTR_UAV_TABLE, 0, s[2:3]      //points to resource constants a,b

                                                                              ;  userElements[1]    = IMM_CONST_BUFFER, 0, s[4:7]   //kernel range parameters

                                                                              ;  userElements[2]    = IMM_CONST_BUFFER, 1, s[8:11]   //offsets for a and b

                                                                              There is a limit that it can pass only 3 userelements only, and if it runs out then PTR_UAV_TABLE and/or PTR_CONST_TABLE comes in. I don't understand why it can't use more userElements, and why pack it into arrays because under cal you can use a lot more uavs/cb-es than 3.

                                                                               

                                                                              COMPUTE_PGM_RSRC2:USER_SGPR      = 16

                                                                              became

                                                                              COMPUTE_PGM_RSRC2:USER_SGPR      = 12

                                                                              Now I know were s16=WorkGroupID comes from.

                                                                               

                                                                              Different uav/cb configs changing the symbol tables in the cal.elf

                                                                               

                                                                              ocl.elf, .rodata: there is 2 new lines in the text part:

                                                                              ;pointer:a:u32:1:1:0:uav:10:4:RW:0:0

                                                                              ;pointer:b:u32:1:1:16:uav:11:4:RO:0:0  <- new line, containing variable name and read/write info +more, mapping to IL (uav10, uav11)

                                                                               

                                                                              ocl.elf .rodata text part, exact type identifiers:

                                                                              ;reflection:0:uint*

                                                                              ;reflection:1:uint*

                                                                               

                                                                              (.rodata binary part left unchanged)

                                                                               

                                                                              Here are some example function headers and how to access each parameters:  [ <- denotes an indirect read (dword  address)

                                                                               

                                                                              __kernel void Test(__global uint *a){

                                                                                a[get_global_id(0)]=5;}

                                                                              COMPUTE_PGM_RSRC2:USER_SGPR      = 16

                                                                              ;  userElements[0]    = IMM_UAV, 10, s[4:7]              //a uav

                                                                              ;  userElements[1]    = IMM_CONST_BUFFER, 0, s[8:11]     //range params

                                                                              ;  userElements[2]    = IMM_CONST_BUFFER, 1, s[12:15]    //a offset

                                                                              a[0]: resource(s[4:7]) offset s[12:15][0]

                                                                               

                                                                              __kernel void Test(__global uint *a,  __global uint *b)

                                                                                a[get_global_id(0)] = b[get_global_id(0)];

                                                                              ;  userElements[0]    = PTR_UAV_TABLE, 0, s[2:3]         //points to a,b uav

                                                                              ;  userElements[1]    = IMM_CONST_BUFFER, 0, s[4:7]      //range params

                                                                              ;  userElements[2]    = IMM_CONST_BUFFER, 1, s[8:11]     //a,b offsets

                                                                              a[0] at resource(s[2:3][0x50:0x53]) offset s[8:11][0]

                                                                              b[0] at resource(s[2:3][0x58:0x5b]) offset s[8:11][4]

                                                                               

                                                                              __kernel void Test(__global uint *a,  __global uint *b, __global uint *c){

                                                                                a[get_global_id(0)] = b[3]+c[5]; }

                                                                              userElementCount     = 3;

                                                                              ;  userElements[0]    = PTR_UAV_TABLE, 0, s[2:3]         //points to a,b uav

                                                                              ;  userElements[1]    = IMM_CONST_BUFFER, 0, s[4:7]      //range params

                                                                              ;  userElements[2]    = IMM_CONST_BUFFER, 1, s[8:11]     //a,b,c offsets

                                                                              a[0] at resource(s[2:3][0x50:0x53]) offset s[8:11][0]

                                                                              b[0] at resource(s[2:3][0x58:0x5b]) offset s[8:11][4]

                                                                              c[0] at resource(s[2:3][0x60:0x63]) offset s[8:11][8]

                                                                               

                                                                              I gotta find out __constants later...

                                                                                • Re: Raw ISA bytecode location
                                                                                  drallan

                                                                                  realhet wrote:

                                                                                   

                                                                                   

                                                                                  COMPUTE_PGM_RSRC2:USER_SGPR      = 16

                                                                                  became

                                                                                  COMPUTE_PGM_RSRC2:USER_SGPR      = 12

                                                                                  Now I know were s16=WorkGroupID comes from.

                                                                                  Great work, I've been looking for that one for a long time.

                                                                                    • Re: Raw ISA bytecode location
                                                                                      realhet

                                                                                      Hello,

                                                                                      here I decoded some __kernel parameter combinations.

                                                                                      They are complicated but there is logic in them.

                                                                                      And this time I don't used get_global_id() so it uses one less constant buffer, this way a 2 parameter kernel need no additional resource_constant reads from ptr_[uav/const]_tables. I think a 0 based WorkGroupID (sn) and a WorkItemID (v0) is more than enough, so I choose not to understand that kernel parameter buffer.

                                                                                       

                                                                                      ;opencl kernel parameters

                                                                                      ;legend: g=get_global_id   u=UAV   c=CB

                                                                                      ; declarations: u: __global uint *x    c: __constant uint *x

                                                                                      ; dw(a,b) read dword from res_const(a) + dword offset b

                                                                                      ; dwx4() read 4x dwords

                                                                                      ; ofs x  means byte offset x

                                                                                       

                                                                                      type     GroupID     Params

                                                                                      u          12          a=s[4:7] ofs dw(s[8:11],0)

                                                                                      uu         16          a=s[4:7] ofs dw(s[12:15],0)

                                                                                                      b=s[8:11] ofs dw(s[12:15],4)

                                                                                      uuu         8      a=dwx4(s[2:3],0x50) ofs dw(s[4:7],0)

                                                                                                      b=dwx4(s[2:3],0x58) ofs dw(s[4:7],4)

                                                                                                      c=dwx4(s[2:3],0x60) ofs dw(s[4:7],8)

                                                                                      uuuu        8      a=dwx4(s[2:3],0x50) ofs dw(s[4:7],0)

                                                                                                      b=dwx4(s[2:3],0x58) ofs dw(s[4:7],4)

                                                                                                      c=dwx4(s[2:3],0x60) ofs dw(s[4:7],8)

                                                                                                      d=dwx4(s[2:3],0x68) ofs dw(s[4:7],0xC)

                                                                                      uc         16      a=s[4:7] ofs dw(s[8:11],0)

                                                                                                      b=s[12:15] ofs 0

                                                                                      uuc        12          a=dwx4(s[2:3],0x50) ofs dw(s[4:7],0)

                                                                                                      b=dwx4(s[2:3],0x58) ofs dw(s[4:7],4)

                                                                                                      c=s[8 :11] ofs 0

                                                                                      uuuc       12          a=dwx4(s[2:3],0x50) ofs dw(s[4:7],0)

                                                                                                      b=dwx4(s[2:3],0x58) ofs dw(s[4:7],4)

                                                                                                      c=dwx4(s[2:3],0x60) ofs dw(s[4:7],8)

                                                                                                      d=s[8:11] ofs 0

                                                                                      ucc         8          a=s[4:7] ofs dw(dw4x(s[2:3],4),0)

                                                                                                      b=dw4x(s[2:3],8) ofs 0

                                                                                                      c=dw4x(s[2:3],12) ofs 0

                                                                                      uucc       12          a=s[4:7] ofs dw(dw4x(s[2:3],4),0)

                                                                                                      b=s[8:11] ofs dw(dw4x(s[2:3],4),4)

                                                                                                      c=dw4x(s[2:3],8) ofs 0

                                                                                                      d=dw4x(s[2:3],12) ofs 0

                                                                                      uccc        8          a=s[4:7] ofs dw(dw4x(s[2:3],4),0)

                                                                                                      b=dw4x(s[2:3],8) ofs 0

                                                                                                      c=dw4x(s[2:3],12) ofs 0

                                                                                                      d=dw4x(s[2:3],16) ofs 0

                                                                                       

                                                                                      Btw, why it uses an additional offset for every UAV? They could easily put that offset into the resource constant, why make the ocl program slower when it could be faster?

                                                                                      Even I can bet it's always zero and they was used in the past when systems had only 1 or 8 UAVs max. I'll figure that out soon.

                                                                                      Another test I wanna check: what if I write into a constant buffer? Does it send that back the data to the host, as in cal? If so, then we can have 3 buffer parameters without additional cost.