cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

kos
Journeyman III

Complete IL reference ?

Where to find that once ?

I've got CAL SDK and happy to use it.  But I am interesting in possible arguments for some instructions like dcl_input[_usage(usage)] dst[.mask]. I've found this http://www.warthman.com/projects-ati-CAL-IL.htm, but I don't now were to get it.

0 Likes
20 Replies
bjang
Journeyman III

You can get it in "doc" directory of CAL SDK on your machine after installing CAL SDK. The doc is named il.pdf.

0 Likes
kos
Journeyman III

That's not complete reference I think there are just 186 pages, and there are not enough information, I need to understand how to work with textures and how samplers realy work.

0 Likes

Originally posted by: kos

That's not complete reference I think there are just 186 pages, and there are not enough information, I need to understand how to work with textures and how samplers realy work.



I absolutely agree. AMD has been VERY scarce on providing good documentation. The IL Reference Manual leaves A LOT to be desired, namely good descriptions and full functionality.
0 Likes

I think we need to ask for that via email... Becouse it phisically exist somewhere in amd's company. I don't see any amd's moderators in this topic... why?

0 Likes

Why should you have to ask for it via email? All the information you need to program in the SDK should be in the docs and it's not. There are way too many details left out and unexplained.

Having to run back to this forum for every little question that was left out of the documentation and hoping that someone from AMD will answer within a week is really not good.
0 Likes

Hi ryta1203,

I completely agree that you should not have to go the forum for your normal programming needs. You'll see a much cleaned up doc in AMD Stream SDK v1.2-beta.

As far as textures and samplers, the published IL reference was edited to show what we believed to be relevant from a compute stand point. However, your request is noted and I'll ask them to see if they can provide more texture and sampler information.

In addition to texture and sampler information, what other specific things are you looking for so I can make the proper request?

And, I apologize for being a little scarce on the forums recently! Too many duties, too little time!

Michael.
0 Likes

OK, can I access texture element by it's coordinate from any thread I want ?  How to get thread number ? I can't understand one thing : writing il kernel I'm writing thread code or something else ? 
                                                                                           
Proper reques ? I thing that the complete manual realy exist...                  

 

0 Likes

Originally posted by: michael.chu@amd.com

Hi ryta1203,



I completely agree that you should not have to go the forum for your normal programming needs. You'll see a much cleaned up doc in AMD Stream SDK v1.2-beta.



As far as textures and samplers, the published IL reference was edited to show what we believed to be relevant from a compute stand point. However, your request is noted and I'll ask them to see if they can provide more texture and sampler information.



In addition to texture and sampler information, what other specific things are you looking for so I can make the proper request?



And, I apologize for being a little scarce on the forums recently! Too many duties, too little time!



Michael.


It's not clear how all the pieces fit together, for example, where to get the inputs from in a kernel is not clear to me. That is, what register/memory are they stored in and how do I get them? How do I access each element?

The special registers need to be explained more like vWinCoord:

Enum: IL_REGTYPE_WINCOORD
Text Syntax: vWinCoord
Common Name: Window Coordinate Register
Number of Components per Register: 2
Description:
. The first and second components are the X and Y position of the pixel's in window.
. The third component is the Z coordinate of the pixel in window space.
. The fourth component is W.
. This is a read-only register. It cannot be the destination of any instruction.
. This register cannot be used with relative addressing.
. It is an error to use this register in a real time kernel.

This is a poor description and limited. ALSO, it's put in terms of "pixels" even though we are talking about GPGPU (where most have NO graphics background and don't care to).

Also, it says that any register prefixed with "v" is an input register; but v1 is NOT a register where v0 is, so if I have 8 input streams to a kernel how do I access them?

Also, examples for the CAL sdk are very limited, there just aren't that many of them.

I think that there needs to be a programming guide for IL also, not just a reference manual. It's just hard to put the CAL and IL together.

0 Likes

Ryta,
The new documentation that we have been working on hopefully covers these concerns. As for the v0-v7 registers, please see calCtxRunProgramParams in cal_ext.h. The only example we have using this is the brook+ example. The basics of it is that you specify three corners of the rectangle you want to run for each of the eight inputs as a CALparam object and then you access them via the v0-v7 in the kernel. I have not used them myself, so can't really give much more information outside of that, but the brook+ source code does have a usage example.
0 Likes

Originally posted by: MicahVillmow

Ryta,

The new documentation that we have been working on hopefully covers these concerns. As for the v0-v7 registers, please see calCtxRunProgramParams in cal_ext.h. The only example we have using this is the brook+ example. The basics of it is that you specify three corners of the rectangle you want to run for each of the eight inputs as a CALparam object and then you access them via the v0-v7 in the kernel. I have not used them myself, so can't really give much more information outside of that, but the brook+ source code does have a usage example.


Micah,

This is the problem I am talking about:

1. The CAL documentation names these registers and says they are used for inputs and yet AMD only has Brook+ examples?? (which is actually not true anyways)

2. IMO, users shouldn't have to dig through header files and 100000+ lines of code to find something that should be in the documentation anyways.

3. I'm not sure what you mean because the HELLOCAL example uses the v0 register for getting the input (which is why in another thread I was trying to use v1 to no avail, so there is a CAL example that uses these registers), while other examples use the vWinCoord or vObjIndex register, so it can be fairly confusing.

4. It seems that there may just be very poor inter-group communication at AMD on this project. This is just a vague/blurry observation.
0 Likes

Ryta,
I'm sorry if what I said was confusing. I was not talking about Brook+ examples, but the Brook+ runtime itself is the only example we currently have that uses this feature. I understand your frustration with the documentation and examples, but we are working hard on it, and I'll add this to the list of samples we need to develop.

As for Hellocal, the difference between v0 and vWinCoord are negligible and are in essence the same thing if you use calCtxRunProgram, however, only through calCtxRunProgramParams can you change the behavior of v0 and get access to v1-v7.

The sample probably won't make it into the next release, but I will see if we can get it added to the following release.
0 Likes

Micah,

The one thing I am really confused on is this:

Let's say I have 8 inputs. How do I access these inputs in CAL? How do I put them into registers? From vWinCoord? From v0-v7? I just think the documentation needs to fill the gap better. It's almost like AMD had one person doing one doc and another person doing another doc and the two people never spoke, so it's very broken with plenty of gaps in the docs.

It's like teaching what a controller is and what an architecture is but never really explaining how the two communicate or interact, IMO.
0 Likes

Ryta,
Understood. These inputs are read-only variables created by the hardware. They are values that are interpolated over the domain of execution. The only way to access them is via the registers in the IL kernel. vWinCoord0 is the default setup interpolated value which is based on your execution domain. vWinCoord0 can also be called v0. Via the calCtxRunProgramParams call you can tell CAL to setup interpolated values that are different than the domain of execution, however the rectangle that the v0-v7 values are interpolated over need to be specified in the CALparams structure.

Hope this helps understanding until the newer docs are released and they should explain this in more detail.
0 Likes

Micah,

Sorry, I must just be really thick headed. This didn't help at all.

This didn't really explain to me where the inputs are being stored (I understand they are read-only, just like in Brook+, but where are they stored so that I may access them?) or how to bring them into the registers (like r0, r1, r2, etc) or where to bring them in from.
0 Likes

Hey guys take a look at this : http://www.warthman.com/projects-ati-CAL-IL.htm. Is it real manual, which we need ?

0 Likes


Hi there,

Glad there seems to be agreement now on how vWinCoord works. If I end up using the v# registers I'll try and provide a link to an example.

Kos, it seems you have found a picture of the front page of a later revision of the il.pdf document that comes with the cal sdk. Perhaps that version, or a later one still, will come with the forthcoming release. From Michael and Micah's comments I look forward to seeing the new docs!

Best,
Steven.
0 Likes

Ryta,
Ok, lets try it by example then.

The following very simple IL example:
il_ps_2_0
dcl_input_position_interp(linear_noperspective)_centered vWinCoord0.xy__
mov g[0], vWinCoord0.xy
end

Produces the following ISA:
;PS; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(6)
0 z: MOV R0.z, 0.0f
w: MOV R0.w, 0.0f
1 x: MOV R1.x, 0.0f
y: MOV R1.y, 0.0f
z: MOV R1.z, 0.0f
w: MOV R1.w, 0.0f
01 MEM_GLOBAL_WRITE: DWORD_PTR[0], R0, ELEM_SIZE(3)
02 EXP_DONE: PIX0, R1
END_OF_PROGRAM


So, basically what is occuring is that the ISA is copying zero values to the z and w components of register 0 and then zero to all components to register 1. It writes out register 0 to the global buffer and then writes out register 1 to the color buffer 0. Now, where is vWinCoord0? The hardware when it created each individual thread places the interpolated values of the execution domain in r0.xy and thus does not need to do any copies.

So, can you access these values outside of the kernel? The answer is no as they are dynamically generated by the hardware at thread-spawn time. You can bring the values into registers in IL by doing a mov r0, vWinCoord0/v0, or you can just save virtual registers and just use vWinCoord0/v0 where ever you want to use that value.


0 Likes

Micah,

So if I have multiple inputs "mov r0, vWinCoord0/v0" will bring in multiple inputs? I guess that doesn't make sense to me. It seems to me that's only going to bring in one value from one input? Is that right? Will "mov r1, vWinCoord0/v0" bring in the next value from the next input? If not, how does that work?

This is assuming that all inputs have the same domain of execution and scatter is not needed.
0 Likes


Hi there,

I think you need to think about vWinCoord0 and v#'s like indices (or pointers) into arrays (the dcl_input_... instructions tell the compiler to set them up like this for you). "mov r0, vWinCoord0" moves the vWinCoord0 index itself into r0, NOT the associated value in an input array; vWinCoord0 is not automatically dereferenced. To get the value you need first to link up an array to a resource id using various cal functions on the cpp side and the dcl_resource_id(x)_... on the il side. For 8 input streams you'd declare 8 resources. Then you need to use e.g. sample_resource(3)_... r10, r0 to acually get the value in input array 3 corresponding to the position now stored in r0 into register r10. As Micah says, you could just write e.g. sample_resource(3)_... r10, vWinCoord0.xy if you wanted to. However, by loading vWinCoord0 into r0 you can play with the index (e.g. multiply by 4 say) first. You then load from multiple input arrays by using multiple sample_resource(x) instructions, and, by operating on the source register in between, you can load from different positions in each array.

Never having tried v1, v2..., I'm not sure, but from what I understand, they are just multiple indices set up for you and you'd just get any values you want by sampling as before, but with the appropriate index, like sample_resource(3)_... r13,v3. The point seems to be that you can get the hardware to precalculate input indices for you rather than you having to do it yourself.

The vWinCoord0 and v# registers in IL seem to be an abstraction and don't seem to correspond to any special registers in the GPUISA. Rather, the hardware "secretly" preinitializes the first few R# physical registers for you with the appropriate per-thread values before the shader runs.

Hopefully that's not too far off!

Best,
Steven.
0 Likes

Originally posted by: sgratton

Hi there,



I think you need to think about vWinCoord0 and v#'s like indices (or pointers) into arrays (the dcl_input_... instructions tell the compiler to set them up like this for you). "mov r0, vWinCoord0" moves the vWinCoord0 index itself into r0, NOT the associated value in an input array; vWinCoord0 is not automatically dereferenced. To get the value you need first to link up an array to a resource id using various cal functions on the cpp side and the dcl_resource_id(x)_... on the il side. For 8 input streams you'd declare 8 resources. Then you need to use e.g. sample_resource(3)_... r10, r0 to acually get the value in input array 3 corresponding to the position now stored in r0 into register r10. As Micah says, you could just write e.g. sample_resource(3)_... r10, vWinCoord0.xy if you wanted to. However, by loading vWinCoord0 into r0 you can play with the index (e.g. multiply by 4 say) first. You then load from multiple input arrays by using multiple sample_resource(x) instructions, and, by operating on the source register in between, you can load from different positions in each array.



Never having tried v1, v2..., I'm not sure, but from what I understand, they are just multiple indices set up for you and you'd just get any values you want by sampling as before, but with the appropriate index, like sample_resource(3)_... r13,v3. The point seems to be that you can get the hardware to precalculate input indices for you rather than you having to do it yourself.



The vWinCoord0 and v# registers in IL seem to be an abstraction and don't seem to correspond to any special registers in the GPUISA. Rather, the hardware "secretly" preinitializes the first few R# physical registers for you with the appropriate per-thread values before the shader runs.



Hopefully that's not too far off!



Best,

Steven.



This is what I thought and thanks Steven, this is really a great description. Hopefully the documentation will be this detailed and straight forward.
0 Likes