Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III



Here in the lab we have a mac pro with a Radeon HD 4870 and Intel Xeon.

When we query the OpenCL capabilities of the OpenCL GPU device "Radeon HD 4870", asking for CL_DEVICE_IMAGE_SUPPORT, we receive a tremendous and paradoxical "no".

The funny thing is that OpenCL CPU device "Xeon" support images even if it is not really meant to work with them...

We realized that this is not a new thing


At this point we have some questions here...

When will the Radeon HD 4870 support images? Like is there any planned date?

Is there at the current time any AMD/ATI GPU that supports images?

If not, why it is not supported? Is it because AMD suggests to not use them because of low-performances? (this would sound strange)

Shouldn't the use of images improve the performances of  OpenCL-based applications, also for the OpenCL version by AMD?

We are trying to understand this...

Thank you in advance,


8 Replies
Journeyman III

Hi Diego,

The current beta release(beta4) does not have any extensions supported for GPU.

 We agree that image support is important, and the developers are actively working on it for a release in the future.


Is there a way other than images to get data loaded into the texture caches in the mean time?


Originally posted by: kbrafford Is there a way other than images to get data loaded into the texture caches in the mean time?


No other way. images means textures in case GPU.

In upcoming release, images will be supported.

You can use CAL or Brook+ for this purpose.


If one were wanting to implement an algorithm that used lookup tables, would images be the way to port it to the GPU?  Is there another way you would do that?


If the lookup table is known at compile time, just create the lookup table inside the source file. The OpenCL stack should handle it correctly.

I'm sure the compiler will handle it correctly, as in produce correct results.  But I'm interested in making sure that I am doing it correctly such that I am not incuring unnecessary global reads.

I have about 32KB total worth of 16-bit (short int) constants in 4 or so lookup tables of different sizes.  I'd like to be able to access them in parallel from different threads in as quick a way as possible.  Architecturally it would seem like the texture cache is ideal, but if I just place them in the CL kernel file and tag it with the __constant specifier, will they be located somewhere that will be accessed quickly?


Although it is not in the current release, if you place data like this in a constant address space array in the kernel file, it will be placed in a constant buffer when this gets fully implemented. The constant buffer peak is around a factor of 10x faster than the L1 speed on 770, which is ~480GB/s, but slower than register file access.

Source: - Master's Thesis of Rahul Garg

Master's Thesis of Rahul Garg

AMD calls each ALU as stream processor (SP). Each SP can execute one fp32 or one
int32 multiply-and-add (MAD) operation per cycle


Really? I'm under strong impression that there no integer MAD implemented in hardware (only IL's imad which translated into 2 operations) and in fact only "t" unit can perform integer multiplications thus reducing peak performance to 1/5.