Clues to runtime tuning of ATI's OpenCL

October 28, 2009


I have just completed the OpenCL portion of my application, using hardcoded work & work group dimensions, on a 2 GPU Mac OSX environment. This program will be supported on OSX & Windows.  I want to build a calibration routine, which takes into account the OS, and OpenCL information about each device, then through timed test runs of each kernel, determine the best work & work group size for each device.

I find no input on how to determine sizes for ATI GPU's.  Is this because it is simply a matter of getting the number of processors?  Is I/O latency hiding important?  Are there any magic numbers like NVIDIA's WARP?  Will specifying too high a work group size cause a crash?  Are the max compute units, max work item sizes & max work group sizes the same across the entire GPU product line?

I have not been able to actually try my program on an ATI GPU due to my use of textures.  This has not been an issue till now.  Developing on more than system at a time is too much work.  I would to pencil something in for Windows / ATI, while I am here though.