Hi all,
I'm new to this group, as I am to GPU programming. I do have some experience in assembler programming, though, on all sorts of CPUs and DSPs. Before I start a related project I would like to ask your advice.
I have this algorithm, nearest-neighbor search in high-dimensional spaces, which has a runtime in the order of months on our PC cluster, and which I would like to port to a GPU or multiple thereof. I have looked into the architecture of the 4870x2 (RV770) and it appears to be an ideal platform. I have installed the latest Brook and CAL SDKs, which, apparently, require some learning effort.
Now, what I would like to do is program the GPU on the assembler (ISA) level, including the use of the new "Local Data Share" and "Global Data Share" buffers.
I have read through some documentation, but it is unclear to me if the tools would support my plans. In particular, there is no RV770 ISA document and the GSA wouldn't produce any RV770 code. Furthermore, I have read statements from R. Koduri that the support for CAL would be dropped in the near future.
So this leaves me wondering: given the current tool set, do I have a chance to finish this project? Are there any updates in the very near future I would have to wait for? Are my plans unrealistic anyway? Any advice/opinion is highly appreciated.
Thanks, later
Originally posted by: FirstTimeRightFurthermore, I have read statements from R. Koduri that the support for CAL would be dropped in the near future.
This made me sad, fearful, and slightly angry ...
Originally posted by: MicahVillmow As far as I know CAL support will not be dropped.
... but this makes me happy again . I'm all for standardization (as in OpenCL), but CAL+IL allow one to squeeze the most juice out of the hardware. Eventually, I suspect that something like OpenCL will make it possible to generate comparable code (perhaps with some vendor-specific extensions a la OpenGL), but it will take a while for the driver support and compilers to get to that level. For now, I'm quite happy with CAL and IL.
If you could let me know what website is reporting that we are dropping CAL support, it would be greatly appreciated so we can get it corrected.
Most sites just mention CTM, but some throw in CAL as well. I guess the only thing better than an acronym is two acronyms! There's www.cycore-clan.com, for instance, and
http://www.computerbase.de/news/hardware/grafikkarten/ati/2008/august/ati_cal_dx11
(the latter is in German, which I do not speak, but Google was nice enough to translate).
Thank you for clarifying this, Micah.
FirstTimeRight, I too advise working at the IL level rather than ISA. It's basically portable assembly that provides some future-proofing for your project. If you work at the ISA level, by the time you finish your project, the next generation of cards will be out and you will have to rewrite it...
Just my $0.02CDN.
lpw,
thanks for your comments. You say that for now, you're quite happy with CAL and IL. May I ask if you are targeting an RV770 chip? If so, do you have any futher advice to give concerning the Local Data Share?
Thanks, later
Originally posted by: FirstTimeRightMay I ask if you are targeting an RV770 chip?
Not yet, but I'm hoping to soon...
If so, do you have any futher advice to give concerning the Local Data Share?
Sorry, but no. Right now, I have refactored my algorithms to eliminate any data dependencies between threads as much as possible. I will definitely give those buffers some consideration once I have the harware and documentation in hand, though.
Cheers,
Lukasz
Micah,
thanks for your reply.
The website which apparently reported it first (at least most others are referring to it) is
http://www.tgdaily.com/content/view/38764/140/
also:
http://www.dagdaily.co.za/index.php?option=com_content&task=view&id=10011&Itemid=218
But they only mention CTM. CAL was mentioned by a few others, mostly german websites:
http://www.computerbase.de/news/hardware/grafikkarten/ati/2008/august/ati_cal_dx11/
http://www.hardware-infos.com/news.php?news=2287
http://www.hardwareluxx.de/category.php?id=11
http://www.cycore-clan.com/
and quite a few more.
For the non-expert reader like my humble self, who looks at figure 1.1, page 5 in the "Compute Abstraction Layer Programming Guide" it's natural to link CAL and CTM, though, I think.
Thanks for your comments on ISA and IL. I do have a mid-September deadline for this project, is there any hope for the next release to be available by then?
Thanks a lot, later
Micah,
allow me to ask some further questions. I'm aware of the fact that the data might not be for public release.
Until the Local Data Share has made it into the tools I thought I could use the "Constant Cache" instead. But performance will depend on size and organization of this unit. Literature is actually not quite conclusive about what that unit actually is. Can you shed some light on this? In particular,
- can you make any statement about the size?
- assumed all threads want to access the same constant at the same time, will there be a conflict?
- what are access patterns to avoid?
The Programming Guide says, "... Before the kernel executes, the buffers
are copied directly to a special cache on the GPU (see Figure 1.4 – Instruction and Constant Cache)." This implies that the cache can hold the maximum aggregate size of the constant buffers. Is that so? Or are there still cache line replacements going on?
Thanks a lot, later