Archives Discussions

sgratton · ‎07-31-2008

Two different codes need two different interpretations

Hi there,

I wonder if anybody else has come up against the following. I have two different CAL/IL programs, each of which runs kernels on multiple subdomains of a domain. To get each program to work as intended, a given CALdomain {a,b,c,d} seems to have to mean a different thing, either:

A: a,b the coords (in the main domain) of the top-left corner of the subdomain, and c,d the subdomain's width and height

or

B: a,b the coords (in the main domain) of the top-left corner of the subdomain and c,d the coords (in the main domain) of the subdomain's bottom right corner.

(The topleft corner has to be away from 0,0 to see any difference!)

The first case uses regular input and output buffers whereas the second one outputs to a global buffer only. Otherwise there doesn't seem to be any major difference between the programs.

Has anybody else seen such behaviour? (It is very confusing.) From the names of the fields in cal.h I think interpretation A is the intended one. So is behaviour B an issue relating to having no regular output buffers?

Thanks,
Steven.

sgratton · ‎08-06-2008

Hi there,

In order to illustrate this, I've put two simple codes in here: see domainout.cpp and domainglobal.cpp. Each sets up the appropriate buffer, then writes out vObjIndex0 and vWinCoord.xy for a subdomain.

To run them, compile with something like (on 64-bit linux)

g++ -m64 domainout.cpp -o domout.go -I/usr/local/amdcal/include -L/usr/local/amdcal/lib64 -lamdcalcl -lamdcalrt

and then run like

./domout.go > tmpout .

To see the output, open tmpout in a text editor like gedit (with the text wrapping turned off in the preferences!).

On my machine, both versions give the same output when the domain is e.g. the full size of the buffer, but give different outputs as mentioned in my previous post for a domain of {1,2,4,10} say.

Also of interest in the outputs is the relation of vObjIndex0.x to vWinCoord.xy. It doesn't seem to be the "y*width+x" of the cal-readme, but might rather be related to the way a domain is broken into wavefronts; it "snakes" around 8x8 blocks in the domain, based around 2x2 quads (try different values, including odd ones, for the domain parameters to see this).

Anyhow, any advice on what is going on with subdomains and global buffers (or suggestions on what I might be doing wrong in the global case leading to the apparent reinterpretation of the domain) would be much appreciated!

Thanks a lot,
Steven.

MicahVillmow · ‎10-09-2008

I'll have someone look at the domain issue and see if it is a problem.

As for the 'snaking' behavior you are noticing. This is correct that it is not y * width + x, but follows the rasterization pattern of the graphics card as specified in section 1.2 of the 1.2 SDK Stream Computing User Guide. As pixel shader mode is specifically for graphics, the thread index is a little different. If you want linear indexing, you need to use compute shader and not pixel shader in IL.

Archives Discussions

What exactly do the numbers in a CALdomain mean?