20 Replies Latest reply on Sep 4, 2008 7:58 PM by sgratton

    Complete IL reference ?

    kos
      Where to find that once ?

      I've got CAL SDK and happy to use it.  But I am interesting in possible arguments for some instructions like dcl_input[_usage(usage)] dst[.mask]. I've found this http://www.warthman.com/projects-ati-CAL-IL.htm, but I don't now were to get it.

        • Complete IL reference ?
          bjang

          You can get it in "doc" directory of CAL SDK on your machine after installing CAL SDK. The doc is named il.pdf.

            • Complete IL reference ?
              kos

              That's not complete reference I think there are just 186 pages, and there are not enough information, I need to understand how to work with textures and how samplers realy work.

                • Complete IL reference ?
                  ryta1203
                  Originally posted by: kos

                  That's not complete reference I think there are just 186 pages, and there are not enough information, I need to understand how to work with textures and how samplers realy work.



                  I absolutely agree. AMD has been VERY scarce on providing good documentation. The IL Reference Manual leaves A LOT to be desired, namely good descriptions and full functionality.
                    • Complete IL reference ?
                      kos

                      I think we need to ask for that via email... Becouse it phisically exist somewhere in amd's company. I don't see any amd's moderators in this topic... why?

                        • Complete IL reference ?
                          ryta1203
                          Why should you have to ask for it via email? All the information you need to program in the SDK should be in the docs and it's not. There are way too many details left out and unexplained.

                          Having to run back to this forum for every little question that was left out of the documentation and hoping that someone from AMD will answer within a week is really not good.
                            • Complete IL reference ?
                              michael.chu
                              Hi ryta1203,

                              I completely agree that you should not have to go the forum for your normal programming needs. You'll see a much cleaned up doc in AMD Stream SDK v1.2-beta.

                              As far as textures and samplers, the published IL reference was edited to show what we believed to be relevant from a compute stand point. However, your request is noted and I'll ask them to see if they can provide more texture and sampler information.

                              In addition to texture and sampler information, what other specific things are you looking for so I can make the proper request?

                              And, I apologize for being a little scarce on the forums recently! Too many duties, too little time!

                              Michael.
                                • Complete IL reference ?
                                  kos

                                  OK, can I access texture element by it's coordinate from any thread I want ?  How to get thread number ? I can't understand one thing : writing il kernel I'm writing thread code or something else ? 
                                                                                                                             
                                  Proper reques ? I thing that the complete manual realy exist...                  

                                   

                                  • Complete IL reference ?
                                    ryta1203
                                    Originally posted by: michael.chu@amd.com

                                    Hi ryta1203,



                                    I completely agree that you should not have to go the forum for your normal programming needs. You'll see a much cleaned up doc in AMD Stream SDK v1.2-beta.



                                    As far as textures and samplers, the published IL reference was edited to show what we believed to be relevant from a compute stand point. However, your request is noted and I'll ask them to see if they can provide more texture and sampler information.



                                    In addition to texture and sampler information, what other specific things are you looking for so I can make the proper request?



                                    And, I apologize for being a little scarce on the forums recently! Too many duties, too little time!



                                    Michael.


                                    It's not clear how all the pieces fit together, for example, where to get the inputs from in a kernel is not clear to me. That is, what register/memory are they stored in and how do I get them? How do I access each element?

                                    The special registers need to be explained more like vWinCoord:

                                    Enum: IL_REGTYPE_WINCOORD
                                    Text Syntax: vWinCoord
                                    Common Name: Window Coordinate Register
                                    Number of Components per Register: 2
                                    Description:
                                    . The first and second components are the X and Y position of the pixel's in window.
                                    . The third component is the Z coordinate of the pixel in window space.
                                    . The fourth component is W.
                                    . This is a read-only register. It cannot be the destination of any instruction.
                                    . This register cannot be used with relative addressing.
                                    . It is an error to use this register in a real time kernel.

                                    This is a poor description and limited. ALSO, it's put in terms of "pixels" even though we are talking about GPGPU (where most have NO graphics background and don't care to).

                                    Also, it says that any register prefixed with "v" is an input register; but v1 is NOT a register where v0 is, so if I have 8 input streams to a kernel how do I access them?

                                    Also, examples for the CAL sdk are very limited, there just aren't that many of them.

                                    I think that there needs to be a programming guide for IL also, not just a reference manual. It's just hard to put the CAL and IL together.

                        • Complete IL reference ?
                          MicahVillmow
                          Ryta,
                          The new documentation that we have been working on hopefully covers these concerns. As for the v0-v7 registers, please see calCtxRunProgramParams in cal_ext.h. The only example we have using this is the brook+ example. The basics of it is that you specify three corners of the rectangle you want to run for each of the eight inputs as a CALparam object and then you access them via the v0-v7 in the kernel. I have not used them myself, so can't really give much more information outside of that, but the brook+ source code does have a usage example.
                            • Complete IL reference ?
                              ryta1203
                              Originally posted by: MicahVillmow

                              Ryta,

                              The new documentation that we have been working on hopefully covers these concerns. As for the v0-v7 registers, please see calCtxRunProgramParams in cal_ext.h. The only example we have using this is the brook+ example. The basics of it is that you specify three corners of the rectangle you want to run for each of the eight inputs as a CALparam object and then you access them via the v0-v7 in the kernel. I have not used them myself, so can't really give much more information outside of that, but the brook+ source code does have a usage example.


                              Micah,

                              This is the problem I am talking about:

                              1. The CAL documentation names these registers and says they are used for inputs and yet AMD only has Brook+ examples?? (which is actually not true anyways)

                              2. IMO, users shouldn't have to dig through header files and 100000+ lines of code to find something that should be in the documentation anyways.

                              3. I'm not sure what you mean because the HELLOCAL example uses the v0 register for getting the input (which is why in another thread I was trying to use v1 to no avail, so there is a CAL example that uses these registers), while other examples use the vWinCoord or vObjIndex register, so it can be fairly confusing.

                              4. It seems that there may just be very poor inter-group communication at AMD on this project. This is just a vague/blurry observation.
                            • Complete IL reference ?
                              MicahVillmow
                              Ryta,
                              I'm sorry if what I said was confusing. I was not talking about Brook+ examples, but the Brook+ runtime itself is the only example we currently have that uses this feature. I understand your frustration with the documentation and examples, but we are working ******* it, and I'll add this to the list of samples we need to develop.

                              As for Hellocal, the difference between v0 and vWinCoord are negligible and are in essence the same thing if you use calCtxRunProgram, however, only through calCtxRunProgramParams can you change the behavior of v0 and get access to v1-v7.

                              The sample probably won't make it into the next release, but I will see if we can get it added to the following release.
                                • Complete IL reference ?
                                  ryta1203
                                  Micah,

                                  The one thing I am really confused on is this:

                                  Let's say I have 8 inputs. How do I access these inputs in CAL? How do I put them into registers? From vWinCoord? From v0-v7? I just think the documentation needs to fill the gap better. It's almost like AMD had one person doing one doc and another person doing another doc and the two people never spoke, so it's very broken with plenty of gaps in the docs.

                                  It's like teaching what a controller is and what an architecture is but never really explaining how the two communicate or interact, IMO.
                                • Complete IL reference ?
                                  MicahVillmow
                                  Ryta,
                                  Understood. These inputs are read-only variables created by the hardware. They are values that are interpolated over the domain of execution. The only way to access them is via the registers in the IL kernel. vWinCoord0 is the default setup interpolated value which is based on your execution domain. vWinCoord0 can also be called v0. Via the calCtxRunProgramParams call you can tell CAL to setup interpolated values that are different than the domain of execution, however the rectangle that the v0-v7 values are interpolated over need to be specified in the CALparams structure.

                                  Hope this helps understanding until the newer docs are released and they should explain this in more detail.
                                    • Complete IL reference ?
                                      ryta1203
                                      Micah,

                                      Sorry, I must just be really thick headed. This didn't help at all.

                                      This didn't really explain to me where the inputs are being stored (I understand they are read-only, just like in Brook+, but where are they stored so that I may access them?) or how to bring them into the registers (like r0, r1, r2, etc) or where to bring them in from.
                                      • Complete IL reference ?
                                        kos

                                        Hey guys take a look at this : http://www.warthman.com/projects-ati-CAL-IL.htm. Is it real manual, which we need ?

                                          • Complete IL reference ?
                                            sgratton

                                            Hi there,

                                            Glad there seems to be agreement now on how vWinCoord works. If I end up using the v# registers I'll try and provide a link to an example.

                                            Kos, it seems you have found a picture of the front page of a later revision of the il.pdf document that comes with the cal sdk. Perhaps that version, or a later one still, will come with the forthcoming release. From Michael and Micah's comments I look forward to seeing the new docs!

                                            Best,
                                            Steven.
                                        • Complete IL reference ?
                                          MicahVillmow
                                          Ryta,
                                          Ok, lets try it by example then.

                                          The following very simple IL example:
                                          il_ps_2_0
                                          dcl_input_position_interp(linear_noperspective)_centered vWinCoord0.xy__
                                          mov g[0], vWinCoord0.xy
                                          end

                                          Produces the following ISA:
                                          ;PS; -------- Disassembly --------------------
                                          00 ALU: ADDR(32) CNT(6)
                                          0 z: MOV R0.z, 0.0f
                                          w: MOV R0.w, 0.0f
                                          1 x: MOV R1.x, 0.0f
                                          y: MOV R1.y, 0.0f
                                          z: MOV R1.z, 0.0f
                                          w: MOV R1.w, 0.0f
                                          01 MEM_GLOBAL_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)
                                          02 EXP_DONE: PIX0, R1
                                          END_OF_PROGRAM


                                          So, basically what is occuring is that the ISA is copying zero values to the z and w components of register 0 and then zero to all components to register 1. It writes out register 0 to the global buffer and then writes out register 1 to the color buffer 0. Now, where is vWinCoord0? The hardware when it created each individual thread places the interpolated values of the execution domain in r0.xy and thus does not need to do any copies.

                                          So, can you access these values outside of the kernel? The answer is no as they are dynamically generated by the hardware at thread-spawn time. You can bring the values into registers in IL by doing a mov r0, vWinCoord0/v0, or you can just save virtual registers and just use vWinCoord0/v0 where ever you want to use that value.


                                            • Complete IL reference ?
                                              ryta1203
                                              Micah,

                                              So if I have multiple inputs "mov r0, vWinCoord0/v0" will bring in multiple inputs? I guess that doesn't make sense to me. It seems to me that's only going to bring in one value from one input? Is that right? Will "mov r1, vWinCoord0/v0" bring in the next value from the next input? If not, how does that work?

                                              This is assuming that all inputs have the same domain of execution and scatter is not needed.
                                                • Complete IL reference ?
                                                  sgratton

                                                  Hi there,

                                                  I think you need to think about vWinCoord0 and v#'s like indices (or pointers) into arrays (the dcl_input_... instructions tell the compiler to set them up like this for you). "mov r0, vWinCoord0" moves the vWinCoord0 index itself into r0, NOT the associated value in an input array; vWinCoord0 is not automatically dereferenced. To get the value you need first to link up an array to a resource id using various cal functions on the cpp side and the dcl_resource_id(x)_... on the il side. For 8 input streams you'd declare 8 resources. Then you need to use e.g. sample_resource(3)_... r10, r0 to acually get the value in input array 3 corresponding to the position now stored in r0 into register r10. As Micah says, you could just write e.g. sample_resource(3)_... r10, vWinCoord0.xy if you wanted to. However, by loading vWinCoord0 into r0 you can play with the index (e.g. multiply by 4 say) first. You then load from multiple input arrays by using multiple sample_resource(x) instructions, and, by operating on the source register in between, you can load from different positions in each array.

                                                  Never having tried v1, v2..., I'm not sure, but from what I understand, they are just multiple indices set up for you and you'd just get any values you want by sampling as before, but with the appropriate index, like sample_resource(3)_... r13,v3. The point seems to be that you can get the hardware to precalculate input indices for you rather than you having to do it yourself.

                                                  The vWinCoord0 and v# registers in IL seem to be an abstraction and don't seem to correspond to any special registers in the GPUISA. Rather, the hardware "secretly" preinitializes the first few R# physical registers for you with the appropriate per-thread values before the shader runs.

                                                  Hopefully that's not too far off!

                                                  Best,
                                                  Steven.
                                                    • Complete IL reference ?
                                                      ryta1203
                                                      Originally posted by: sgratton

                                                      Hi there,



                                                      I think you need to think about vWinCoord0 and v#'s like indices (or pointers) into arrays (the dcl_input_... instructions tell the compiler to set them up like this for you). "mov r0, vWinCoord0" moves the vWinCoord0 index itself into r0, NOT the associated value in an input array; vWinCoord0 is not automatically dereferenced. To get the value you need first to link up an array to a resource id using various cal functions on the cpp side and the dcl_resource_id(x)_... on the il side. For 8 input streams you'd declare 8 resources. Then you need to use e.g. sample_resource(3)_... r10, r0 to acually get the value in input array 3 corresponding to the position now stored in r0 into register r10. As Micah says, you could just write e.g. sample_resource(3)_... r10, vWinCoord0.xy if you wanted to. However, by loading vWinCoord0 into r0 you can play with the index (e.g. multiply by 4 say) first. You then load from multiple input arrays by using multiple sample_resource(x) instructions, and, by operating on the source register in between, you can load from different positions in each array.



                                                      Never having tried v1, v2..., I'm not sure, but from what I understand, they are just multiple indices set up for you and you'd just get any values you want by sampling as before, but with the appropriate index, like sample_resource(3)_... r13,v3. The point seems to be that you can get the hardware to precalculate input indices for you rather than you having to do it yourself.



                                                      The vWinCoord0 and v# registers in IL seem to be an abstraction and don't seem to correspond to any special registers in the GPUISA. Rather, the hardware "secretly" preinitializes the first few R# physical registers for you with the appropriate per-thread values before the shader runs.



                                                      Hopefully that's not too far off!



                                                      Best,

                                                      Steven.



                                                      This is what I thought and thanks Steven, this is really a great description. Hopefully the documentation will be this detailed and straight forward.