14 Replies Latest reply on Jan 15, 2011 6:45 PM by Raistmer

    How to use image as 1D array?

      need to cache function values

      I want to use image as cache for function values in some range.

      I create image in such way:

      float* cache = (float*) malloc(8192 * sizeof(cl_float));

      for (int i = 0; i < 8192; ++i) {
      double chisqr = 1.0 + (double) i / 8192.0 * 10.0;
      cache[ i ] = (float)lcgf(0.5*gauss_dof,std::max(chisqr*0.5*gauss_bins,0.5*gauss_dof+1));
      cl_image_format image_format;
      image_format.image_channel_data_type = CL_FLOAT;
      image_format.image_channel_order = CL_R;
      CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,&image_format,8192,1,0,&cache,&err);

      Then I use it in kernel in such way:

      float calc_GaussFit_score_cached(float chisqr, float null_chisqr,float score_offset,image2d_t gauss_cache,image2d_t null_cache) { // <- gauss_pot_length constant across whole package
      float chisqr_cache = (chisqr - 1.0f) / 10.f; //R: normalized coords clamped to the edge
      float null_chisqr_cache = (null_chisqr - 1.0f) / 10.f;
      return score_offset+
      (read_imagef(gauss_cache, read_sampler, (int2)(chisqr_cache, 1))).x+
      (read_imagef(null_cache, read_sampler, (int2)(chisqr_cache, 1))).x;

      App crashed when reach corresponding kernel.

      What is wrong? Should I use float4 in CPU array? I see that read_imagef always return float4, but I need only one float value per image element...
        • How to use image as 1D array?

          I tried to use 1D images as well but with image_channel_order set to CL_A. Every sample except (int2)(0,0) returned undefined values. I have no issues using 2D and 3D images with CL_RGBA, they work just fine. In my case I just used a buffer instead of a single channel 1D image.

          • How to use image as 1D array?
            Thanks for answer.
            There is no fast local memory available for HD4xxx GPUs and this array will be accessed randomly, cause it contains cached values for some function => poor performance expected if just global memory will be used.
            To speedup access I trying to make use texture cache via image usage.
            Another option would be to make use constant memory cache, but then I will lose "free" linear interpolation ability between dots -> worse precision.

            Any comments from AMD staff? What is wrong with 1D image usage? Are there any examples how to use images as cache for function values? I've seen mention about such usage in manual, btw, but w/o concrete samples.
            • How to use image as 1D array?

              Since you're the second person I've seen ask how to do this, I'll add some API and kernel support for it in clUtil (http://code.google.com/p/clutil/). OpenCL does not actually have 1D image support. You have the right idea of emulating it using a 2D image (which gives you a max size of 65 million instead of 8k). If I recall correctly, you always use float4 when reading and writing to images only the appropriate channels will be assigned when you sample.

              • How to use image as 1D array?
                Thanks all for answers.
                I'll try to implement your suggestions in next app versions.

                OpenCL was chosen to provide support for biggest number of devices possible but looks like performance requirements will push to code divercity anyway. Many kernels run better in different implementations for NV and ATI, different kernels required for HD4xxx and HD5xxx generations, probably for HD6xxx too... cuFFT still faster on NV cards than OpenCL implementations so CUDA port looks unavoidable... So using CAL++ looks as possible solution too, especially if it gives access to more HD4xx hardware features.