9 Replies Latest reply on May 26, 2012 8:46 AM by Jawed

    Idiot's guide to OpenCL with c++

    jazpearson

      Hi,

       

      I am a c++ programmer, but still very much learning. My experience with OpenCL is nill and my experience of trying any hello world opencl tutorials isn't great. I can never get them working. I can get the sample code working (from the sdk installs), but i'd like to be able to do something a bit more simple before i can delve into something more complicated.

       

      I've just ordered the book "heterogeneous computing with opencl", and hopefully this will help somewhat. However, what i'd really like, to get started is an idiot's guide to opencl.

       

      For example: let's say that i want to add all the numbers up from 1 to 100. In c++, this is obviously very simple. A straight forward for loop would suffice.

       

      int total = 0;
      for (int i = 1; i <= 100; i++)
        total += i;
      

       

      But how could i do this in opencl properly?

       

      Once i can get an understanding of the very basics, then i'll be able to get going.

       

      What header files are needed? What opencl files do we need?

       

      I appreciate your help.

        • Re: Idiot's guide to OpenCL with c++
          jazpearson

          (I'm working on v2.6 sdk, opencl 1.1. I have the firepro v4800 graphics card)

          • Re: Idiot's guide to OpenCL with c++
            Rom1

            Hello,

            what you asked is not the simplest you can start with.

            Your loop need the previous iteration so it's not quite parallel.

            This kind of problem is called a reduction problem, it can be improved by parallelism/opencl but it's not obvious.

            Try first to parallelize this

             

            int A[1000], B[1000], C[1000];
            ...
            for(int i=0;i<1000;i++) C[i]=A[i]+B[i];
            
              • Re: Idiot's guide to OpenCL with c++
                jazpearson

                Ah yes, of course. I see what you mean in terms of my original post.

                 

                So how would i go about writing the opencl code for this one that you suggested? What i want to do, is within a c++ program / environment, alter the code so that it can be output on to the GPU rather than the CPU. All the tutorials i've had a look at seem to jump a few steps ahead and i need those first few steps!

                 

                Here is some code that addresses what you ask for

                 

                __kernel void vec_add (__global float *a, __global float *b, _global float *c)
                {
                  int gid = get_global_id(0);
                  c[gid] = a[gid] + b[gid];
                }
                

                 

                However - where does this code go? In the cpp file? In a cl file? I assume this part of the code would be in the cpp file?

                (i read this particular example somewhere else)

                Daft question (amongst many others i'll be asking) but are __kernel and __global keywords specific to opencl?

                 

                I know there's a lot more code to write to actually execute the calculations. Question is: what? and where do they go?

                 

                The parameters i pass in - do they have to be pointers? Could they be passed in as references (given that i was passing in a class which contained an array of floats, say)?

                  • Re: Idiot's guide to OpenCL with c++
                    Wenju

                    Hi, jazpearson

                    Your are a c++ programmer,right? So in my opinion, the kernel is just a function.But it's much special,where you must prepare many many things in order to invoke it.For example,

                    • __kernel void vec_add (__global float *a, __global float *b, _global float *c)  
                    • {  
                    •   int gid = get_global_id(0);  
                    •   c[gid] = a[gid] + b[gid];  

                    this is your kernel,and you want to invoke it,and as you know, all programs must be running under our control.And how to run and manage your kernel? Platform, context, device, program, command queue and so on,these things are the necessities for running a kernel(you can get these information at anywhere).You must know the steps about running a kernel. And let's talk about the kernel.It's a special function,so it has its own keywords and functions,such as  "__kernel","get_global_id()".And I suggest you should look more samples and tutorials.You can get all answers from those opencl tutorials.

                    Thank you.

                      • Re: Idiot's guide to OpenCL with c++
                        mflamer

                        There are a few good books available that will cover the openCL basics. The one you mentioned is pretty good, I also would reccomend "openCl in Action". It covers all the basics in a very easy to understand manner. Good luck. Also, most of the time my openCL kernel code goes in .cl files that get compiled at runtime.

                  • Re: Idiot's guide to OpenCL with c++
                    cyndwith

                    Even am new to openCL...but i have tried some code referng to a book and it worked...

                     

                    i hope it helps you ...in this code i have written the kernel in a text file...we need to store the kernel ans a string and the

                    openCL functions take this string and compile it to generate the executable code...dumped in to the device context...in this code

                     

                     

                     

                    // System includes

                    #include <stdio.h>

                    #include <stdlib.h>

                    #include<windows.h>

                    // OpenCL includes

                    #include <CL/cl.h>

                     

                    const char *SourceFile = (//"vectormul.txt";-----do not include this  if you are not saving kernel in a text file... just write the string as below)

                    /*"_kernel                        \n"

                    "void vecadd(__global int*A,    \n"

                    "             __global int*B,    \n"

                    "             __global int*C)    \n"

                    "{                                \n"

                    "int idx=get_global_id(0);        \n"

                    "  printf("%d ",A[idx]);                    \n"

                    "    cout<<endl;                    \n"

                    " cout<<"   "<<B[idx]<<endl;    \n"

                    "C[idx]=A[idx]+B[idx];            \n"

                    "}                                \n"

                    ;*/

                    // Project includes

                    // Constants, globals

                    const int ELEMENTS = 10;   // elements in each vector

                     

                    // Signatures

                    char* readSource(const char *sourceFilename);

                     

                    int main(int argc, char ** argv)

                    {

                       printf("Running Vector Addition program\n\n");

                     

                       size_t datasize = sizeof(int)*ELEMENTS;

                     

                       int *A, *B;   // Input arrays

                       float *C;       // Output array

                     

                       // Allocate space for input/output data

                       A = (int*)malloc(datasize);

                       B = (int*)malloc(datasize);

                       C = (float*)malloc(datasize);

                       if(A == NULL || B == NULL || C == NULL) {

                          perror("malloc");

                          exit(-1);

                       }

                     

                       // Initialize the input data

                       for(int i = 0; i < ELEMENTS; i++) {

                          A[i] = i;

                          B[i] = i;

                       }

                     

                        for(int i = 0; i < ELEMENTS; i++) {

                         

                            printf("%d ",A[i]);

                       }

                        printf("\n");

                       

                        for(int i = 0; i < ELEMENTS; i++) {

                         

                          printf("%d ",B[i]);

                       }

                       

                        printf("\n");

                     

                     

                       cl_int status;  // use as return value for most OpenCL functions

                     

                       cl_uint numPlatforms = 0;

                       cl_platform_id *platforms;

                     

                    /////////////////////////////////////////////

                    // STEP 1: Discover and initialize platforms

                    /////////////////////////////////////////////

                                

                       // Query for the number of recongnized platforms

                       status = clGetPlatformIDs(0, NULL, &numPlatforms);

                       if(status != CL_SUCCESS) {

                          printf("clGetPlatformIDs failed\n");

                          exit(-1);

                       }

                     

                       // Make sure some platforms were found

                       if(numPlatforms == 0) {

                          printf("No platforms detected.\n");

                          //exit(-1);

                       }

                     

                       // Allocate enough space for each platform

                       platforms = (cl_platform_id*)malloc(numPlatforms*sizeof(cl_platform_id));

                       if(platforms == NULL) {

                          perror("malloc");

                          //exit(-1);

                       }

                     

                       // Fill in platforms

                       clGetPlatformIDs(numPlatforms, platforms, NULL);

                       if(status != CL_SUCCESS) {

                          printf("clGetPlatformIDs failed\n");

                          //exit(-1);

                       }

                     

                       // Print out some basic information about each platform

                       printf("%u platforms detected\n", numPlatforms);

                       for(unsigned int i = 0; i < numPlatforms; i++) {

                          char buf[100];

                          printf("Platform %u: \n", i);

                          status = clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR,

                                           sizeof(buf), buf, NULL);

                          printf("\tVendor: %s\n", buf);

                          status |= clGetPlatformInfo(platforms[i], CL_PLATFORM_NAME,

                                           sizeof(buf), buf, NULL);

                          printf("\tName: %s\n", buf);

                     

                          if(status != CL_SUCCESS) {

                             printf("clGetPlatformInfo failed\n");

                             //exit(-1);

                          }

                       }

                       printf("\n");

                     

                    /////////////////////////////////////////////

                    // STEP 2: Discover and initialize devices

                    /////////////////////////////////////////////

                     

                       cl_uint numDevices = 0;

                       cl_device_id *devices;

                     

                       // Retrieve the number of devices present

                       status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, 0, NULL,

                                               &numDevices);

                       if(status != CL_SUCCESS) {

                          printf("clGetDeviceIDs failed\n");

                          //exit(-1);

                       }

                     

                       // Make sure some devices were found

                       if(numDevices == 0) {

                          printf("No devices detected.\n");

                          //exit(-1);

                       }

                     

                       // Allocate enough space for each device

                       devices = (cl_device_id*)malloc(numDevices*sizeof(cl_device_id));

                       if(devices == NULL) {

                          perror("malloc");

                          //exit(-1);

                       }

                     

                       // Fill in devices

                       status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, numDevices,

                                         devices, NULL);

                       if(status != CL_SUCCESS) {

                          printf("clGetDeviceIDs failed\n");

                          //exit(-1);

                       }  

                     

                       // Print out some basic information about each device

                       printf("%u devices detected\n", numDevices);

                       for(unsigned int i = 0; i < numDevices; i++) {

                          char buf[100];

                          printf("Device %u: \n", i);

                          status = clGetDeviceInfo(devices[i], CL_DEVICE_VENDOR,

                                           sizeof(buf), buf, NULL);

                          printf("\tDevice: %s\n", buf);

                          status |= clGetDeviceInfo(devices[i], CL_DEVICE_NAME,

                                           sizeof(buf), buf, NULL);

                          printf("\tName: %s\n", buf);

                     

                          if(status != CL_SUCCESS) {

                             printf("clGetDeviceInfo failed\n");

                             //exit(-1);

                          }

                       }

                       printf("\n");

                     

                       // START Execution Model

                     

                    /////////////////////////////////////////////

                    // STEP 3: Create a Context

                    /////////////////////////////////////////////

                     

                       cl_context context;

                     

                       // Create a context and associate it with the devices

                       context = clCreateContext(NULL, numDevices, devices, NULL, NULL, &status);

                       if(status != CL_SUCCESS || context == NULL) {

                          printf("clCreateContext failed\n");

                          //exit(-1);

                       }

                     

                    /////////////////////////////////////////////

                    // STEP 4: Create a Command Queue

                    /////////////////////////////////////////////

                     

                       cl_command_queue cmdQueue;

                     

                       // Create a command queue and associate it with the device you

                       // want to execute on

                       cmdQueue = clCreateCommandQueue(context, devices[0], 0, &status);

                       if(status != CL_SUCCESS || cmdQueue == NULL) {

                          printf("clCreateCommandQueue failed\n");

                          //exit(-1);

                       }

                     

                    /////////////////////////////////////////////

                    // STEP 5: Create Device buffers

                    /////////////////////////////////////////////

                     

                       cl_mem d_A, d_B;  // Input buffers on device

                       cl_mem d_C;       // Output buffer on device

                     

                       // Create a buffer object (d_A) that contains the data from the host ptr A

                       d_A = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,

                                       datasize, A, &status);

                       if(status != CL_SUCCESS || d_A == NULL) {

                          printf("clCreateBuffer failed\n");

                          //exit(-1);

                       }

                     

                       // Create a buffer object (d_B) that contains the data from the host ptr B

                       d_B = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,

                                       datasize, B, &status);

                       if(status != CL_SUCCESS || d_B == NULL) {

                          printf("clCreateBuffer failed\n");

                          //exit(-1);

                       }

                     

                       // Create a buffer object (d_C) with enough space to hold the output data

                       d_C = clCreateBuffer(context, CL_MEM_READ_WRITE,

                                       datasize, NULL, &status);

                       if(status != CL_SUCCESS || d_C == NULL) {

                          printf("clCreateBuffer failed\n");

                          //exit(-1);

                       }

                      

                    /////////////////////////////////////////////

                    // STEP 6: Create and compile the program

                    /////////////////////////////////////////////

                     

                       cl_program program;

                      

                       char *source;

                       printf("start reading source file!\n");

                       //const char *sourceFile = "vectoradd.cl";

                       // This function reads in the source code of the program

                       source = readSource(SourceFile);//File);

                        printf("done! reading source file. \n");

                       //printf("Program source is:\n%s\n", source);

                     

                       // Create a program. The 'source' string is the code from the

                       // vectoradd.cl file.

                       program = clCreateProgramWithSource(context, 1, (const char**)&source,//source,

                                                  NULL, &status);

                       if(status != CL_SUCCESS) {

                          printf("clCreateProgramWithSource failed\n");

                          //exit(-1);

                       }

                    printf("done! creating programe source file. \n");

                       cl_int buildErr;

                       // Build (compile & link) the program for the devices.

                       // Save the return value in 'buildErr' (the following

                       // code will print any compilation errors to the screen)

                       buildErr = clBuildProgram(program, numDevices, devices, NULL, NULL, NULL);

                    printf("done! building source file. \n");

                       // If there are build errors, print them to the screen

                       if(buildErr != CL_SUCCESS) {

                          printf("Program failed to build.\n");

                          cl_build_status buildStatus;

                          for(unsigned int i = 0; i < numDevices; i++) {

                             clGetProgramBuildInfo(program, devices[i], CL_PROGRAM_BUILD_STATUS,

                                              sizeof(cl_build_status), &buildStatus, NULL);

                             if(buildStatus == CL_SUCCESS) {

                                continue;

                             }

                     

                             char *buildLog;

                             size_t buildLogSize;

                             clGetProgramBuildInfo(program, devices[i], CL_PROGRAM_BUILD_LOG,

                                              0, NULL, &buildLogSize);

                             buildLog = (char*)malloc(buildLogSize);

                             if(buildLog == NULL) {

                                perror("malloc");

                                //exit(-1);

                             }

                             clGetProgramBuildInfo(program, devices[i], CL_PROGRAM_BUILD_LOG,

                                              buildLogSize, buildLog, NULL);

                             buildLog[buildLogSize-1] = '\0';

                             printf("Device %u Build Log:\n%s\n", i, buildLog);  

                             free(buildLog);

                          }

                          exit(0);

                       }

                       else {

                          printf("No build errors\n");

                       }

                     

                    /////////////////////////////////////////////

                    // STEP 7: Create the kernel

                    /////////////////////////////////////////////

                     

                       cl_kernel kernel;

                     

                       // Create a kernel from the vector addition function (named "vecadd")

                       kernel = clCreateKernel(program, "vecadd", &status);

                       if(status != CL_SUCCESS) {

                          printf("clCreateKernel failed\n");

                          //exit(-1);

                       }

                     

                    /////////////////////////////////////////////

                    // STEP 8: Set the kernel arguments

                    /////////////////////////////////////////////

                     

                       // Associate the input and output buffers with the kernel

                       status  = clSetKernelArg(kernel, 0, sizeof(cl_mem), &d_A);

                       status |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &d_B);

                       status |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &d_C);

                       if(status != CL_SUCCESS) {

                          printf("clSetKernelArg failed\n");

                          //exit(-1);

                       }

                     

                    /////////////////////////////////////////////

                    // STEP 9: Configure the work-item structure

                    /////////////////////////////////////////////

                     

                       // Define an index space (global work size) of threads for execution. 

                       // A workgroup size (local work size) is not required, but can be used.

                       size_t globalWorkSize[1];  // There are ELEMENTS threads

                       globalWorkSize[0] = ELEMENTS;

                     

                    /////////////////////////////////////////////

                    // STEP 10: Enqueue the kernel for execution

                    /////////////////////////////////////////////

                     

                       // Execute the kernel.

                       // 'globalWorkSize' is the 1D dimension of the work-items

                       status = clEnqueueNDRangeKernel(cmdQueue, kernel, 1, NULL, globalWorkSize,

                                               NULL, 0, NULL, NULL);

                       if(status != CL_SUCCESS) {

                          printf("clEnqueueNDRangeKernel failed\n");

                          //exit(-1);

                       }

                     

                    /////////////////////////////////////////////

                    // STEP 11: Read the output buffer back to the host

                    /////////////////////////////////////////////

                     

                       // Read the OpenCL output buffer (d_C) to the host output array (C)

                       clEnqueueReadBuffer(cmdQueue, d_C, CL_TRUE, 0, datasize, C,

                                      0, NULL, NULL);

                     

                       // Verify correctness

                       bool result = true;

                       for(int i = 0; i < ELEMENTS; i++)

                       {

                     

                           printf("%f ",C[i]);

                          if(C[i] != i*i) {

                             result = false;

                             break;

                          }

                       }

                       printf("\n");

                       if(result) {

                          printf("Output is correct\n");

                       }

                       else {

                          printf("Output is incorrect\n");

                       }

                     

                    /////////////////////////////////////////////

                    // STEP 12:  Release OpenCL resources

                    /////////////////////////////////////////////

                     

                       clReleaseKernel(kernel);

                       clReleaseProgram(program);

                       clReleaseCommandQueue(cmdQueue);

                       clReleaseMemObject(d_A);

                       clReleaseMemObject(d_B);

                       clReleaseMemObject(d_C);

                       clReleaseContext(context);

                     

                       free(A);

                       free(B);

                       free(C);

                       free(source);

                       free(platforms);

                       free(devices);

                     

                     

                       getchar();

                       return 0;

                    }

                     

                    char* readSource(const char *sourceFilename) {

                     

                       FILE *fp;

                       int err;

                       int size;

                     

                       char *source;

                     

                       fp = fopen(sourceFilename, "rb");

                       if(fp == NULL) {

                          printf("Could not open kernel file: %s\n", sourceFilename);

                          //exit(-1);

                       }

                      

                       err = fseek(fp, 0, SEEK_END);

                       if(err != 0) {

                          printf("Error seeking to end of file\n");

                          //exit(-1);

                       }

                     

                       size = ftell(fp);

                       if(size < 0) {

                          printf("Error getting file position\n");

                         // exit(-1);

                       }

                     

                       err = fseek(fp, 0, SEEK_SET);

                       if(err != 0) {

                          printf("Error seeking to start of file\n");

                          //exit(-1);

                       }

                     

                       source = (char*)malloc(size+1);

                       if(source == NULL) {

                          printf("Error allocating %d bytes for the program source\n", size+1);

                          //exit(-1);

                       }

                     

                       err = fread(source, 1, size, fp);

                       if(err != size) {

                          printf("only read %d bytes\n", err);

                         // exit(0);

                       }

                     

                       source[size] = '\0';

                     

                        return source;

                    }

                    • Re: Idiot's guide to OpenCL with c++
                      kbrafford

                      You should check out PyOpenCL.  It's a great way to get introduced to OpenCL without getting bogged down with gory details at first. 

                       

                      --Keith Brafford

                      • Re: Idiot's guide to OpenCL with c++
                        Jawed

                        The OpenCL Spec and Reference Card  are worth having by your side:

                         

                        http://www.khronos.org/registry/cl/