cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

FirstTimeRight
Journeyman III

CAL on 4870x2

Hi all,

I'm new to this group, as I am to GPU programming. I do have some experience in assembler programming, though, on all sorts of CPUs and DSPs. Before I start a related project I would like to ask your advice.

I have this algorithm, nearest-neighbor search in high-dimensional spaces, which has a runtime in the order of months on our PC cluster, and which I would like to port to a GPU or multiple thereof. I have looked into the architecture of the 4870x2 (RV770) and it appears to be an ideal platform. I have installed the latest Brook and CAL SDKs, which, apparently, require some learning effort.

Now, what I would like to do is program the GPU on the assembler (ISA) level, including the use of the new "Local Data Share" and "Global Data Share" buffers.

I have read through some documentation, but it is unclear to me if the tools would support my plans. In particular, there is no RV770 ISA document and the GSA wouldn't produce any RV770 code. Furthermore, I have read statements from R. Koduri that the support for CAL would be dropped in the near future.

So this leaves me wondering: given the current tool set, do I have a chance to finish this project?  Are there any updates in the very near future I would have to wait for? Are my plans unrealistic anyway? Any advice/opinion is highly appreciated.

Thanks, later

 

0 Likes
7 Replies

FirstTimeRight,
As far as I know CAL support will not be dropped, it is the intermediate layer between the high level languages and the ATI Graphics cards. The higher level languages could evolve over time, but the CAL layer will be there. If you could let me know what website is reporting that we are dropping CAL support, it would be greatly appreciated so we can get it corrected. As for the ISA level programming, I would strongly advise against it. The Assembler is currently only being supported for the R6XX series of graphics cards and not for R7XX or later. Programming at the IL level is the recommended way to use Local Data Share and will be exposed in our next release which is due out soon.

0 Likes

Originally posted by: FirstTimeRightFurthermore, I have read statements from R. Koduri that the support for CAL would be dropped in the near future.


This made me sad, fearful, and slightly angry ...

Originally posted by: MicahVillmow As far as I know CAL support will not be dropped.


... but this makes me happy again . I'm all for standardization (as in OpenCL), but CAL+IL allow one to squeeze the most juice out of the hardware. Eventually, I suspect that something like OpenCL will make it possible to generate comparable code (perhaps with some vendor-specific extensions a la OpenGL), but it will take a while for the driver support and compilers to get to that level. For now, I'm quite happy with CAL and IL.

If you could let me know what website is reporting that we are dropping CAL support, it would be greatly appreciated so we can get it corrected.


Most sites just mention CTM, but some throw in CAL as well. I guess the only thing better than an acronym is two acronyms! There's www.cycore-clan.com, for instance, and

http://www.computerbase.de/news/hardware/grafikkarten/ati/2008/august/ati_cal_dx11

(the latter is in German, which I do not speak, but Google was nice enough to translate).

Thank you for clarifying this, Micah.

FirstTimeRight, I too advise working at the IL level rather than ISA. It's basically portable assembly that provides some future-proofing for your project. If you work at the ISA level, by the time you finish your project, the next generation of cards will be out and you will have to rewrite it...

Just my $0.02CDN.

 

0 Likes

lpw,

 

thanks for your comments. You say that for now, you're quite happy with CAL and  IL.  May I ask if you are targeting an RV770 chip? If so, do you have any futher advice to give concerning the Local Data Share?

Thanks, later

0 Likes

Originally posted by: FirstTimeRightMay I ask if you are targeting an RV770 chip?


Not yet, but I'm hoping to soon...

If so, do you have any futher advice to give concerning the Local Data Share?


 

Sorry, but no.  Right now, I have refactored my algorithms to eliminate any data dependencies between threads as much as possible.  I will definitely give those buffers some consideration once I have the harware and documentation in hand, though.

Cheers,

Lukasz

0 Likes

Micah,

thanks for your reply.

The website which apparently reported it first (at least most others are referring to it) is

http://www.tgdaily.com/content/view/38764/140/

also:

http://www.dagdaily.co.za/index.php?option=com_content&task=view&id=10011&Itemid=218

But they only mention CTM. CAL was mentioned by a few others, mostly german websites:

http://www.computerbase.de/news/hardware/grafikkarten/ati/2008/august/ati_cal_dx11/

http://www.hardware-infos.com/news.php?news=2287

http://www.hardwareluxx.de/category.php?id=11

http://www.cycore-clan.com/

and quite a few more.

For the non-expert reader like my humble self, who looks at figure 1.1, page 5 in the "Compute Abstraction Layer Programming Guide" it's natural to link CAL and CTM, though, I think.

Thanks for your comments on ISA and IL. I do have a mid-September deadline for this project, is there any hope for the next release to be available by then?

Thanks a lot, later

 

0 Likes

Micah,

allow me to ask some further questions. I'm aware of the fact that the data might not be for public release.

Until the Local Data Share has made it into the tools I thought I could use the "Constant Cache" instead. But performance will depend on size and organization of this unit. Literature is actually not quite conclusive about what that unit actually is.  Can you shed some light on this? In particular,

- can you make any statement about the size?

- assumed all threads want to access the same constant at the same time, will there be a conflict?

- what are access patterns to avoid?

The Programming Guide says, "... Before the kernel executes, the buffers
are copied directly to a special cache on the GPU (see Figure 1.4 – Instruction and Constant Cache)." This implies that the cache can hold the maximum aggregate size of the constant buffers. Is that so? Or are there still cache line replacements going on?

Thanks a lot, later

0 Likes

FirstTimeRight,
Support for CTM was actually dropped awhile back as CAL can be considered the successor to CTM. Higher level tools are being built on top of CAL, such as Brook+.

As for your question about the constants, please see the r600isa.pdf file, section 4.6.4 ALU Constants. As for the size of the constant buffers themselves, there are 15 accessible constant buffers via the CAL interface with a max buffer size of 4096 float4's.
0 Likes