Archives Discussions

bennC32 · ‎03-03-2011

I don't understand how to set up the buffer needed by clCreateImage2D()

I'm trying to use an Image2D with the channel format of type CL_HALF_FLOAT.

I have no trouble using CL_FLOAT images but I would like to use the CL_HALF_FLOAT type since my application will not fit in the available memory using floats. The part I can't seem to figure out is how to prepare the buffer that gets passed to clCreateImage2D(). I'm assuming that the buffer needs to be (cl_halt *), but it's not clear to me how to create such a thing. I have tried something like

   cl_half *halfData = (cl_half *)malloc(numRows*numCols*sizeof(cl_half));
   for (int i=0; i<numRows*numCols; i++) {
      halfData = (cl_half)(floatData);      // where floatData is an array of floats
   }

That does not appear to work, which is not supprising given how cl_half is defined in CL/cl_platform.h (typedef uint16_t cl_half for Linux).

I have poured over the OpenCL spec looking for some way to convert a float to a cl_half but I didn't find anything. Any suggestions would be appreciated. I'm obviously missing something here. Can anyone point me to an example of using clCreateImage2D() with type CL_HALF_FLOAT?

Thanks,

bennC32

eugenek · ‎03-03-2011

You need to use function vstore_half.

bennC32 · ‎03-04-2011

eugenek,

Thanks for your reply, but I'm still confused. Based on the OpenGL 1.1 spec, section6.11.7 (p.222) it appears to me that the vload/vstore stuff is part of the OpenCL "C language" or kernel language. Also, the half stuff requires the cl_khr_fp16 which my Radeon HD5770 does not seem to support.

So, I wasn't trying to use half or half4 inside the kernel. If I read the OpenCL 1.1 spec correctly (Table 5.7 "Min. list of supported image formats", pp 83-84) suggest that images of channel_order/channel_type = CL_RGBA/CL_HALF_FLOAT is among the required formats (independent of the cl_khr_fp16 ext). A device query for my HD5770 seems to confirm support for CL_HALF_FLOAT.

What I was trying to do then was simply to use clCreateImage2D() to create an image of channel_type = CL_HALF_FLOAT on the host side. Then on the kernel side I thought I would be able to read it with something like:

__kernel void Image2dAdd(__read_only image2d_t aImg,__read_only image2d_t bImg,__write_only image2d_t cImg,const sampler_t sampler,int numRows,int numCols)
{
   ...
   const int2 coord = (int2)(get_global_id(0),get_global_id(1)); // pix loc
   ...
   float4 pixA = read_imagef(aImg,sampler,coord); // get a pixel from the image A
   ...
}

Note that I'm not trying to read the data into a half4 in the kernel with read_imageh(), which would require the cl_khr_fp16 extension (see spec p 302). I'm content with using floats in the kernel given the limitations of my HD5770, but I still want to use the CL_HALF_FLOAT for image storage since all my data won't fit in memory as floats.

Am I wrong in thinking that I should be able to use CL_HALF_FLOAT channel type even though my card does not support cl_khr_fp16? If not, I'm still baffled on how to create the image with clCreateImage2D().

bennC32

rick_weber · ‎03-04-2011

Half precision is going to give you terrible accuracy with only 10 bits of mantissa. You could try 16 bit fixed point if you don't need the dynamic range. That being said, I don't know how you're going to convert your data to half precision on the host without writing your own typecasting functions (which if you don't care about denormals should be doable). Just take the top 10 bits of the float mantissa, the bottom 5 bits of exponent and the sign bit from a float and put them in a half, first checking for overflow. Throw in some handwaving for rounding and it should work.

eugenek · ‎03-04-2011

. That being said, I don't know how you're going to convert your data to half precision on the host without writing your own typecasting functions

which is exactly what vstore_half would do, if it were available on the host. The fact that it isn't, is a strange and puzzling omission. Maybe one of the buffer copying functions can do typecasting implicitly?

That said, even if this feature was available on the host, it would be very slow, compared to uploading data to the device piece by piece and then using a simple kernel to convert float to half.

himanshu_gautam · ‎03-05-2011

But benCC32 has the problem of shortage of space to bring his float buffer to GPU. But Obviously if possible, conversion should be done kernel side as per performance perpectives.

benCC32,

You can also try to perform whatever operation you intend on half the bufffer at one time. Then reading back the output and releasing cl_mem object. Then performing the same for the second part of the buffer. I am not sure this approach would apply to your case or not, but it is a popular one.

Thanks

bennC32 · ‎03-05-2011

Thanks for all your input. I have resigned myself to using int16 or dividing the problem up. This appears to me this is a fundumantal flaw in the OpenCL specification. After all, what is the point in supporting a "CL_HALF_FLOAT" image format if there is no reasaonable way to create one?

I explored the the OpenGL world since it occured to me that those doing high dynamic range rendering must do this all the time. It turns out that it appears to be fairly straightforward to do in OpenGL since the API for creating a 2D texture (analogous to the OpenCL's clCreateImage2D) has separate arguments for "internal format" and [external or host] "format/type". In the following excerpt from the OpenGL Redbook 7th Ed. p.400-401, the "format"/"type" seems to be analogous the the channel_order/channel_type arguments to the OpenCL clCreateImage2D(), while the "internal format" argument seems to have no equivalent in OpenCL.

---------------------------

void glTexImage2D(    GLenum     target,
     GLint     level,
     GLint     internalFormat,
     GLsizei     width,
     GLsizei     height,
     GLint     border,
     GLenum     format,
     GLenum     type,
     const GLvoid *     data);

internalFormat
Specifies the number of color components in the texture.
Must be one of the following symbolic constants: ... GL_R16F ...

format
Specifies the format of the pixel data.
The following symbolic values are accepted: ... GL_RED ...

type

Specifies the data type of the pixel data.
The following symbolic values are accepted: ... GL_FLOAT ...
----------------------------

With that, it seems to support the creation of an 2D texture with "internal type GL_R16F from a host side buffer of "type" GL_FLOAT and "format" GL_RED, which is precisely what I was trying to do.

I'm still hoping I'm just missing something, but for now I'll probably just live with int16 and manage the dynamic range myself with scaling the data.

Thanks,

bennC32

himanshu_gautam · ‎03-07-2011

I think you have got problem in only converting a float to cl_half_float.

As per spec, it would be possible once cl_khr_fp16 extension is supported. This extension is presently not supported on AMd's implementation.

You can still create a cl_half_float buffer directly, initialize it, and use it in images.

Thanks

Archives Discussions

Novice needs help creating an Image2D of type CL_HALF_FLOAT