6 Replies Latest reply on Feb 25, 2011 4:28 PM by rick.weber

    clUtil - a library for making OpenCL as easy to use as CUDA

    rick.weber

      I've written a library that makes OpenCL really easy to use. Consider this trivial program that writes the number 20 to every element in an array:

       

      #include <Opencl/cl.h>

      char const* kernelSource = "__kernel void fill(__global float* array, unsigned int arrayLength, float val)"
      "{"
      "    if(get_global_id(0) < arrayLength)"
      "    {"
      "        array[get_global_id(0)] = val;}"
      "    }"
      "}";

      int main(int argc, char** argv)
      {
         
      float val = 20.0f;
         
      float array[2000];
          cl_int err
      ;
          cl_platform_id platform
      ;
          cl_device_id device
      ;
          cl_context context
      ;
          cl_command_queue commandQueue
      ;
          cl_mem buffer
      ;
          cl_program program
      ;
          cl_kernel kernel
      ;
         
      unsigned int length = 2000;

         
      //Initialization
          err
      = clGetPlatform_IDs(1, &platform, NULL);
          err
      = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ANY, 1, &device, NULL);
          context
      = clCreateContext(NULL, 1, &device, NULL, NULL, &err);
          commandQueue
      = clCreateCommandQueue(context, device, 0, &err);
          program
      = clCreateProgramWithSource(context, 1, &kernelSource, 0, &err);
          err
      = clBuildProgram(program, 1, &device, NULL, NULL, NULL);
          kernel
      = clCreateKernel(program, "fill", &err);

         
      //Allocate memory    
          buffer
      = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(array), NULL, &err);

         
      //Actually call the kernel
          err
      = clSetKernelArg(kernel, 0, sizeof(buffer), &buffer);
          err
      = clSetKernelArg(kernel, 1, sizeof(length), &length);
          err
      = clSetKernelArg(kernel, 2, sizeof(val), &val);

          size_t
      global;
          size_t
      local = 64;
         
         
      global = length % local == 0 ? length : (length / local + 1) * local;

          err
      = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL, &global, &local, 0, NULL, NULL);

         
      //Copy data back
          err
      = clEnqueueReadBuffer(commandQueue, buffer, CL_TRUE, 0, sizeof(array), array, 0, NULL, NULL);

         
      //Free the data on the GPU
          clReleaseMemObject
      (buffer);
      }
      This is the same program using clUtil:

      kernel.cl:

      __kernel void fill(__global float* array, unsigned int arrayLength, float val)
      {
         
      if(get_global_id(0) < arrayLength)
         
      {
              array
      [get_global_id(0)] = val;
         
      }
      }

      main.cc:

      #include <clUtil.h>

      int main(int argc, char** argv)
      {
         
      const char* kernelFiles[] = {"kernel.cl"};
          cl_mem buffer
      ;
         
      float array[2000];
         
      unsigned int length = 2000;
         
      float val = 20.0f;

          clUtilInitialize
      (kernelFiles, 1);
          clUtilAlloc
      (sizeof(array), &buffer);
           
          clUtilEnqueueKernel
      ("fill", clUtilGrid(length, 64), buffer, length, val);
          clUtilDeviceGet
      (array, sizeof(array), buffer);

          clUtilFree
      (buffer);
      }
      It makes assumptions about devices and platforms to reduce the number of handles you have floating around while using C++0x constructs to make calling kernels significantly easier.
      Currently runs in Linux
      Source and documentation at http://code.google.com/p/clutil/


        • clUtil - a library for making OpenCL as easy to use as CUDA
          rick.weber

          I've updated clUtil to support 1D images (emulated on 2D images) and asynchronous data transfers/executions. I've also added examples of each of these features in the examples directory. You need gcc 4.4+ to compile the library and gcc 4.5+ to use lambdas (as is done in the Asynchronous example) with -std=c++0x.

            • clUtil - a library for making OpenCL as easy to use as CUDA
              LeeHowes

              Hi rick,

              It's good to see people working on this kind of thing, and particularly interesting to see C++0x features being used (variadic templates should have been in C++ years ago). Anything to get us away from C. When I have a chance I'll try to take a look at your code, though I'm not a big linux user at the moment so it may be a few weeks.

              Do you feel that this sort of util library is useful in production (if you extend it to optionally not assume platforms and devices, anyway) or more as a learning tool?

              Lee

                • clUtil - a library for making OpenCL as easy to use as CUDA
                  rick.weber

                  The library significantly eases coding and debugging OpenCL programs, so I would say that it is a good learning tool. However, I would also say that it significantly reduces the amount of stuff you have to take care of as a developer. For example, every kernel is shoved into a std::map wen you call clUtilInit, so when you call clUtilEnqueueKernel(), it looks for the kernel by name in the map associated with the currently selected device. That way, you don't have to call clCreateKernel() every time you call a routine, and then figure out what to do with it when you're done. You also don't have to call clKernelSetArg() or any of that stupidness that comes with the territory of C. I hope clUtil is extremely useful in production by dramatically increasing programmer productivity, reducing code bloat, and improving readability.

                  I've been thinking about a Windows port, but I don't have Visual Studio, and I'm not sure if it supports rvalues and variadic templates yet (both of which are required by clUtil). I know it was one of the first compilers to support C++0x lambdas (probably for their ppl library).

                  As for the assumptions clUtil makes about platforms and devices, they're generally pessemistic. They assume you want to compile all source files for every device in every platform. You effectively get a flat device list of all the devices on the system. This assumption has a small esoteric problem that I'm looking at addressing now, namely what happens if a given device is supported by more than one platform?

                    • clUtil - a library for making OpenCL as easy to use as CUDA
                      LeeHowes

                      Excellent, it sounds like you're putting a good amount of effort in. If I get a chance at some point I may experiment with VC++ on your behalf. Keep us informed!

                        • clUtil - a library for making OpenCL as easy to use as CUDA
                          rick.weber

                          I've added new features to clUtil. You can now specify that you want out of order execution before calling initialize and it will be enabled on platforms that support it.

                          I've added clUtilFinalize, that does the opposite of clUtilInitialize; it frees all the contexts, command queues and whathaveyou clUtil uses behind the scenes.

                          Also, I've started writing examples. If anyone wants to see a specific example just let me know, they generally aren't hard to write.

                          Additionally, I've started adding library functions callable from kernels that are executed on a thread block level. For example, radixSortLG sorts an array of length n using all threads in get_local_size(0). This is useful if you want to sort many arrays in parallel (e.g. sort each column or row of a matrix). I've also included sum, scan, and max. Currently, I'm just adding functions that I need for my own projects, but if anyone has requests (or wants to contribute), that would be great.

                          To use the aforementioned functions, just #include <clUtil.cl> in your kernel files. 

                          Also, I've updated the Makefile, so you can actually install clutil into /usr/include and /usr/lib by doing make install.