Hi,
Here in the lab we have a mac pro with a Radeon HD 4870 and Intel Xeon.
When we query the OpenCL capabilities of the OpenCL GPU device "Radeon HD 4870", asking for CL_DEVICE_IMAGE_SUPPORT, we receive a tremendous and paradoxical "no".
The funny thing is that OpenCL CPU device "Xeon" support images even if it is not really meant to work with them...
We realized that this is not a new thing
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=123152&enterthread=y
At this point we have some questions here...
When will the Radeon HD 4870 support images? Like is there any planned date?
Is there at the current time any AMD/ATI GPU that supports images?
If not, why it is not supported? Is it because AMD suggests to not use them because of low-performances? (this would sound strange)
Shouldn't the use of images improve the performances of OpenCL-based applications, also for the OpenCL version by AMD?
We are trying to understand this...
Thank you in advance,
Diego
Hi Diego,
The current beta release(beta4) does not have any extensions supported for GPU.
We agree that image support is important, and the developers are actively working on it for a release in the future.
Is there a way other than images to get data loaded into the texture caches in the mean time?
Originally posted by: kbrafford Is there a way other than images to get data loaded into the texture caches in the mean time?
No other way. images means textures in case GPU.
In upcoming release, images will be supported.
You can use CAL or Brook+ for this purpose.
If one were wanting to implement an algorithm that used lookup tables, would images be the way to port it to the GPU? Is there another way you would do that?
I'm sure the compiler will handle it correctly, as in produce correct results. But I'm interested in making sure that I am doing it correctly such that I am not incuring unnecessary global reads.
I have about 32KB total worth of 16-bit (short int) constants in 4 or so lookup tables of different sizes. I'd like to be able to access them in parallel from different threads in as quick a way as possible. Architecturally it would seem like the texture cache is ideal, but if I just place them in the CL kernel file and tag it with the __constant specifier, will they be located somewhere that will be accessed quickly?
Master's Thesis of Rahul Garg
AMD calls each ALU as stream processor (SP). Each SP can execute one fp32 or one
int32 multiply-and-add (MAD) operation per cycle.
Really? I'm under strong impression that there no integer MAD implemented in hardware (only IL's imad which translated into 2 operations) and in fact only "t" unit can perform integer multiplications thus reducing peak performance to 1/5.