You need to use function vstore_half.
Thanks for your reply, but I'm still confused. Based on the OpenGL 1.1 spec, section6.11.7 (p.222) it appears to me that the vload/vstore stuff is part of the OpenCL "C language" or kernel language. Also, the half stuff requires the cl_khr_fp16 which my Radeon HD5770 does not seem to support.
So, I wasn't trying to use half or half4 inside the kernel. If I read the OpenCL 1.1 spec correctly (Table 5.7 "Min. list of supported image formats", pp 83-84) suggest that images of channel_order/channel_type = CL_RGBA/CL_HALF_FLOAT is among the required formats (independent of the cl_khr_fp16 ext). A device query for my HD5770 seems to confirm support for CL_HALF_FLOAT.
What I was trying to do then was simply to use clCreateImage2D() to create an image of channel_type = CL_HALF_FLOAT on the host side. Then on the kernel side I thought I would be able to read it with something like:
__kernel void Image2dAdd(__read_only image2d_t aImg,__read_only image2d_t bImg,__write_only image2d_t cImg,const sampler_t sampler,int numRows,int numCols)
const int2 coord = (int2)(get_global_id(0),get_global_id(1)); // pix loc
float4 pixA = read_imagef(aImg,sampler,coord); // get a pixel from the image A
Note that I'm not trying to read the data into a half4 in the kernel with read_imageh(), which would require the cl_khr_fp16 extension (see spec p 302). I'm content with using floats in the kernel given the limitations of my HD5770, but I still want to use the CL_HALF_FLOAT for image storage since all my data won't fit in memory as floats.
Am I wrong in thinking that I should be able to use CL_HALF_FLOAT channel type even though my card does not support cl_khr_fp16? If not, I'm still baffled on how to create the image with clCreateImage2D().
Half precision is going to give you terrible accuracy with only 10 bits of mantissa. You could try 16 bit fixed point if you don't need the dynamic range. That being said, I don't know how you're going to convert your data to half precision on the host without writing your own typecasting functions (which if you don't care about denormals should be doable). Just take the top 10 bits of the float mantissa, the bottom 5 bits of exponent and the sign bit from a float and put them in a half, first checking for overflow. Throw in some handwaving for rounding and it should work.
. That being said, I don't know how you're going to convert your data to half precision on the host without writing your own typecasting functions
which is exactly what vstore_half would do, if it were available on the host. The fact that it isn't, is a strange and puzzling omission. Maybe one of the buffer copying functions can do typecasting implicitly?
That said, even if this feature was available on the host, it would be very slow, compared to uploading data to the device piece by piece and then using a simple kernel to convert float to half.
But benCC32 has the problem of shortage of space to bring his float buffer to GPU. But Obviously if possible, conversion should be done kernel side as per performance perpectives.
You can also try to perform whatever operation you intend on half the bufffer at one time. Then reading back the output and releasing cl_mem object. Then performing the same for the second part of the buffer. I am not sure this approach would apply to your case or not, but it is a popular one.
Thanks for all your input. I have resigned myself to using int16 or dividing the problem up. This appears to me this is a fundumantal flaw in the OpenCL specification. After all, what is the point in supporting a "CL_HALF_FLOAT" image format if there is no reasaonable way to create one?
I explored the the OpenGL world since it occured to me that those doing high dynamic range rendering must do this all the time. It turns out that it appears to be fairly straightforward to do in OpenGL since the API for creating a 2D texture (analogous to the OpenCL's clCreateImage2D) has separate arguments for "internal format" and [external or host] "format/type". In the following excerpt from the OpenGL Redbook 7th Ed. p.400-401, the "format"/"type" seems to be analogous the the channel_order/channel_type arguments to the OpenCL clCreateImage2D(), while the "internal format" argument seems to have no equivalent in OpenCL.
void glTexImage2D( GLenum target,
const GLvoid * data);
Specifies the number of color components in the texture.
Must be one of the following symbolic constants: ... GL_R16F ...
Specifies the format of the pixel data.
The following symbolic values are accepted: ... GL_RED ...
Specifies the data type of the pixel data.
The following symbolic values are accepted: ... GL_FLOAT ...
With that, it seems to support the creation of an 2D texture with "internal type GL_R16F from a host side buffer of "type" GL_FLOAT and "format" GL_RED, which is precisely what I was trying to do.
I'm still hoping I'm just missing something, but for now I'll probably just live with int16 and manage the dynamic range myself with scaling the data.
I think you have got problem in only converting a float to cl_half_float.
As per spec, it would be possible once cl_khr_fp16 extension is supported. This extension is presently not supported on AMd's implementation.
You can still create a cl_half_float buffer directly, initialize it, and use it in images.