7 Replies Latest reply on Mar 7, 2011 3:51 AM by himanshu.gautam

    Novice needs help creating an Image2D of type CL_HALF_FLOAT

    bennC32
      I don't understand how to set up the buffer needed by clCreateImage2D()

      I'm trying to use an Image2D with the channel format of type CL_HALF_FLOAT.

      I have no trouble using CL_FLOAT images but I would like to use the  CL_HALF_FLOAT type since my application will not fit in the available memory using floats.  The part I can't seem to figure out is how to prepare the buffer that gets passed to clCreateImage2D().  I'm assuming that the buffer needs to be (cl_halt *), but it's not clear to me how to create such a thing.  I have tried something like

         cl_half *halfData = (cl_half *)malloc(numRows*numCols*sizeof(cl_half));
         for (int i=0; i<numRows*numCols; i++) {
            halfData = (cl_half)(floatData);      // where floatData is an array of floats
         }

      That does not appear to work, which is not supprising given how cl_half is defined in CL/cl_platform.h (typedef uint16_t  cl_half for Linux).

      I have poured over the OpenCL spec looking for some way to convert a float to a cl_half but I didn't find anything.  Any suggestions would be appreciated.  I'm obviously missing something here.  Can anyone point me to an example of using clCreateImage2D() with type CL_HALF_FLOAT?

      Thanks,

      bennC32

        • Novice needs help creating an Image2D of type CL_HALF_FLOAT
          eugenek

          You need to use function vstore_half.

            • Novice needs help creating an Image2D of type CL_HALF_FLOAT
              bennC32

              eugenek,

              Thanks for your reply, but I'm still confused.  Based on the OpenGL 1.1 spec, section6.11.7 (p.222) it appears to me that the vload/vstore stuff is part of the OpenCL "C language" or kernel language.  Also, the half stuff requires the cl_khr_fp16 which my Radeon HD5770 does not seem to support.

              So, I wasn't trying to use half or half4 inside the kernel.  If I read the OpenCL 1.1 spec correctly (Table 5.7 "Min. list of supported image formats", pp 83-84) suggest that images of channel_order/channel_type = CL_RGBA/CL_HALF_FLOAT is among the required formats (independent of the cl_khr_fp16 ext).  A device query for my HD5770 seems to confirm support for CL_HALF_FLOAT.

              What I was trying to do then was simply to use clCreateImage2D() to create an image of channel_type = CL_HALF_FLOAT on the host side.  Then on the kernel side I thought I would be able to read it with something like:

              __kernel void Image2dAdd(__read_only image2d_t aImg,__read_only image2d_t bImg,__write_only image2d_t cImg,const sampler_t sampler,int numRows,int numCols)
              {
                 ...
                 const int2 coord = (int2)(get_global_id(0),get_global_id(1)); // pix loc
                 ...
                 float4 pixA = read_imagef(aImg,sampler,coord);  // get a pixel from the image A
                 ...
              }

              Note that I'm not trying to read the data into a half4 in the kernel with read_imageh(), which would require the cl_khr_fp16 extension (see spec p 302).  I'm content with using floats in the kernel given the limitations of my HD5770, but I still want to use the CL_HALF_FLOAT for image storage since all my data won't fit in memory as floats.

              Am I wrong in thinking that I should be able to use CL_HALF_FLOAT channel type even though my card does not support cl_khr_fp16?  If not, I'm still baffled on how to create the image with clCreateImage2D().

              bennC32

                • Novice needs help creating an Image2D of type CL_HALF_FLOAT
                  rick.weber

                  Half precision is going to give you terrible accuracy with only 10 bits of mantissa. You could try 16 bit fixed point if you don't need the dynamic range. That being said, I don't know how you're going to convert your data to half precision on the host without writing your own typecasting functions (which if you don't care about denormals should be doable). Just take the top 10 bits of the float mantissa, the bottom 5 bits of exponent and the sign bit from a float and put them in a half, first checking for overflow. Throw in some handwaving for rounding and it should work.

                    • Novice needs help creating an Image2D of type CL_HALF_FLOAT
                      eugenek

                       

                      . That being said, I don't know how you're going to convert your data to half precision on the host without writing your own typecasting functions


                      which is exactly what vstore_half would do, if it were available on the host. The fact that it isn't, is a strange and puzzling omission. Maybe one of the buffer copying functions can do typecasting implicitly?

                      That said, even if this feature was available on the host, it would be very slow, compared to uploading data to the device piece by piece and then using a simple kernel to convert float to half.

                        • Novice needs help creating an Image2D of type CL_HALF_FLOAT
                          himanshu.gautam

                          But benCC32 has the problem of shortage of space to bring his float buffer to GPU. But Obviously if possible, conversion should be done kernel side as per performance perpectives.

                          benCC32,

                          You can also try to perform whatever operation you intend on half the bufffer at one time. Then reading back the output and releasing cl_mem object. Then performing the same for the second part of the buffer. I am not sure this approach would apply to your case or not, but it is a popular one.

                          Thanks

                            • Novice needs help creating an Image2D of type CL_HALF_FLOAT
                              bennC32

                              Thanks for all your input.  I have resigned myself to using int16 or dividing the problem up.  This appears to me this is a fundumantal flaw in the OpenCL specification.  After all, what is the point in supporting a "CL_HALF_FLOAT" image format if there is no reasaonable way to create one?

                              I explored the the OpenGL world since it occured to me that those doing high dynamic range rendering must do this all the time.  It turns out that it appears to be fairly straightforward to do in OpenGL since the API for creating a 2D texture (analogous to the OpenCL's clCreateImage2D) has separate arguments for "internal format" and [external or host] "format/type".  In the following excerpt from the OpenGL Redbook 7th Ed. p.400-401, the "format"/"type" seems to be analogous the the channel_order/channel_type arguments to the OpenCL clCreateImage2D(), while the "internal format" argument seems to have no equivalent in OpenCL.

                              ---------------------------

                              void glTexImage2D(    GLenum      target,
                                   GLint      level,
                                   GLint      internalFormat,
                                   GLsizei      width,
                                   GLsizei      height,
                                   GLint      border,
                                   GLenum      format,
                                   GLenum      type,
                                   const GLvoid *      data);

                              internalFormat
                                                      Specifies the number of color components in the texture.
                                                      Must be one of the following symbolic constants: ... GL_R16F ...

                              format
                                                     Specifies the format of the pixel data.
                                                      The following symbolic values are accepted: ... GL_RED ...

                              type

                                                      Specifies the data type of the pixel data.
                                                      The following symbolic values are accepted: ... GL_FLOAT ...
                              ----------------------------

                              With that, it seems to support the creation of an 2D texture with "internal type GL_R16F from a host side buffer of "type" GL_FLOAT and "format" GL_RED, which is precisely what I was trying to do.

                              I'm still hoping I'm just missing something, but for now I'll probably just live with int16 and manage the dynamic range myself with scaling the data.

                              Thanks,

                              bennC32