cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jazpearson
Journeyman III

Idiot's guide to OpenCL with c++

Hi,

I am a c++ programmer, but still very much learning. My experience with OpenCL is nill and my experience of trying any hello world opencl tutorials isn't great. I can never get them working. I can get the sample code working (from the sdk installs), but i'd like to be able to do something a bit more simple before i can delve into something more complicated.

I've just ordered the book "heterogeneous computing with opencl", and hopefully this will help somewhat. However, what i'd really like, to get started is an idiot's guide to opencl.

For example: let's say that i want to add all the numbers up from 1 to 100. In c++, this is obviously very simple. A straight forward for loop would suffice.

int total = 0;

for (int i = 1; i <= 100; i++)

  total += i;

But how could i do this in opencl properly?

Once i can get an understanding of the very basics, then i'll be able to get going.

What header files are needed? What opencl files do we need?

I appreciate your help.

0 Likes
9 Replies
jazpearson
Journeyman III

(I'm working on v2.6 sdk, opencl 1.1. I have the firepro v4800 graphics card)

0 Likes
Rom1
Adept I

Hello,

what you asked is not the simplest you can start with.

Your loop need the previous iteration so it's not quite parallel.

This kind of problem is called a reduction problem, it can be improved by parallelism/opencl but it's not obvious.

Try first to parallelize this

int A[1000], B[1000], C[1000];

...

for(int i=0;i<1000;i++) C=A+B;

0 Likes

Ah yes, of course. I see what you mean in terms of my original post.

So how would i go about writing the opencl code for this one that you suggested? What i want to do, is within a c++ program / environment, alter the code so that it can be output on to the GPU rather than the CPU. All the tutorials i've had a look at seem to jump a few steps ahead and i need those first few steps!

Here is some code that addresses what you ask for

__kernel void vec_add (__global float *a, __global float *b, _global float *c)

{

  int gid = get_global_id(0);

  c[gid] = a[gid] + b[gid];

}

However - where does this code go? In the cpp file? In a cl file? I assume this part of the code would be in the cpp file?

(i read this particular example somewhere else)

Daft question (amongst many others i'll be asking) but are __kernel and __global keywords specific to opencl?

I know there's a lot more code to write to actually execute the calculations. Question is: what? and where do they go?

The parameters i pass in - do they have to be pointers? Could they be passed in as references (given that i was passing in a class which contained an array of floats, say)?

0 Likes

Hi, jazpearson

Your are a c++ programmer,right? So in my opinion, the kernel is just a function.But it's much special,where you must prepare many many things in order to invoke it.For example,

  • __kernel void vec_add (__global float *a, __global float *b, _global float *c)  
  • {  
  •   int gid = get_global_id(0);  
  •   c[gid] = a[gid] + b[gid];  

this is your kernel,and you want to invoke it,and as you know, all programs must be running under our control.And how to run and manage your kernel? Platform, context, device, program, command queue and so on,these things are the necessities for running a kernel(you can get these information at anywhere).You must know the steps about running a kernel. And let's talk about the kernel.It's a special function,so it has its own keywords and functions,such as  "__kernel","get_global_id()".And I suggest you should look more samples and tutorials.You can get all answers from those opencl tutorials.

Thank you.

0 Likes

There are a few good books available that will cover the openCL basics. The one you mentioned is pretty good, I also would reccomend "openCl in Action". It covers all the basics in a very easy to understand manner. Good luck. Also, most of the time my openCL kernel code goes in .cl files that get compiled at runtime.

0 Likes
cyndwith
Journeyman III

Even am new to openCL...but i have tried some code referng to a book and it worked...

i hope it helps you ...in this code i have written the kernel in a text file...we need to store the kernel ans a string and the

openCL functions take this string and compile it to generate the executable code...dumped in to the device context...in this code

// System includes

#include <stdio.h>

#include <stdlib.h>

#include<windows.h>

// OpenCL includes

#include <CL/cl.h>

const char *SourceFile = (//"vectormul.txt";-----do not include this  if you are not saving kernel in a text file... just write the string as below)

/*"_kernel                        \n"

"void vecadd(__global int*A,    \n"

"             __global int*B,    \n"

"             __global int*C)    \n"

"{                                \n"

"int idx=get_global_id(0);        \n"

"  printf("%d ",A[idx]);                    \n"

"    cout<<endl;                    \n"

" cout<<"   "<<B[idx]<<endl;    \n"

"C[idx]=A[idx]+B[idx];            \n"

"}                                \n"

;*/

// Project includes

// Constants, globals

const int ELEMENTS = 10;   // elements in each vector

// Signatures

char* readSource(const char *sourceFilename);

int main(int argc, char ** argv)

{

   printf("Running Vector Addition program\n\n");

   size_t datasize = sizeof(int)*ELEMENTS;

   int *A, *B;   // Input arrays

   float *C;       // Output array

   // Allocate space for input/output data

   A = (int*)malloc(datasize);

   B = (int*)malloc(datasize);

   C = (float*)malloc(datasize);

   if(A == NULL || B == NULL || C == NULL) {

      perror("malloc");

      exit(-1);

   }

   // Initialize the input data

   for(int i = 0; i < ELEMENTS; i++) {

      A = i;

      B = i;

   }

    for(int i = 0; i < ELEMENTS; i++) {

     

        printf("%d ",A);

   }

    printf("\n");

   

    for(int i = 0; i < ELEMENTS; i++) {

     

      printf("%d ",B);

   }

   

    printf("\n");

   cl_int status;  // use as return value for most OpenCL functions

   cl_uint numPlatforms = 0;

   cl_platform_id *platforms;

/////////////////////////////////////////////

// STEP 1: Discover and initialize platforms

/////////////////////////////////////////////

            

   // Query for the number of recongnized platforms

   status = clGetPlatformIDs(0, NULL, &numPlatforms);

   if(status != CL_SUCCESS) {

      printf("clGetPlatformIDs failed\n");

      exit(-1);

   }

   // Make sure some platforms were found

   if(numPlatforms == 0) {

      printf("No platforms detected.\n");

      //exit(-1);

   }

   // Allocate enough space for each platform

   platforms = (cl_platform_id*)malloc(numPlatforms*sizeof(cl_platform_id));

   if(platforms == NULL) {

      perror("malloc");

      //exit(-1);

   }

   // Fill in platforms

   clGetPlatformIDs(numPlatforms, platforms, NULL);

   if(status != CL_SUCCESS) {

      printf("clGetPlatformIDs failed\n");

      //exit(-1);

   }

   // Print out some basic information about each platform

   printf("%u platforms detected\n", numPlatforms);

   for(unsigned int i = 0; i < numPlatforms; i++) {

      char buf[100];

      printf("Platform %u: \n", i);

      status = clGetPlatformInfo(platforms, CL_PLATFORM_VENDOR,

                       sizeof(buf), buf, NULL);

      printf("\tVendor: %s\n", buf);

      status |= clGetPlatformInfo(platforms, CL_PLATFORM_NAME,

                       sizeof(buf), buf, NULL);

      printf("\tName: %s\n", buf);

      if(status != CL_SUCCESS) {

         printf("clGetPlatformInfo failed\n");

         //exit(-1);

      }

   }

   printf("\n");

/////////////////////////////////////////////

// STEP 2: Discover and initialize devices

/////////////////////////////////////////////

   cl_uint numDevices = 0;

   cl_device_id *devices;

   // Retrieve the number of devices present

   status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, 0, NULL,

                           &numDevices);

   if(status != CL_SUCCESS) {

      printf("clGetDeviceIDs failed\n");

      //exit(-1);

   }

   // Make sure some devices were found

   if(numDevices == 0) {

      printf("No devices detected.\n");

      //exit(-1);

   }

   // Allocate enough space for each device

   devices = (cl_device_id*)malloc(numDevices*sizeof(cl_device_id));

   if(devices == NULL) {

      perror("malloc");

      //exit(-1);

   }

   // Fill in devices

   status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, numDevices,

                     devices, NULL);

   if(status != CL_SUCCESS) {

      printf("clGetDeviceIDs failed\n");

      //exit(-1);

   }  

   // Print out some basic information about each device

   printf("%u devices detected\n", numDevices);

   for(unsigned int i = 0; i < numDevices; i++) {

      char buf[100];

      printf("Device %u: \n", i);

      status = clGetDeviceInfo(devices, CL_DEVICE_VENDOR,

                       sizeof(buf), buf, NULL);

      printf("\tDevice: %s\n", buf);

      status |= clGetDeviceInfo(devices, CL_DEVICE_NAME,

                       sizeof(buf), buf, NULL);

      printf("\tName: %s\n", buf);

      if(status != CL_SUCCESS) {

         printf("clGetDeviceInfo failed\n");

         //exit(-1);

      }

   }

   printf("\n");

   // START Execution Model

/////////////////////////////////////////////

// STEP 3: Create a Context

/////////////////////////////////////////////

   cl_context context;

   // Create a context and associate it with the devices

   context = clCreateContext(NULL, numDevices, devices, NULL, NULL, &status);

   if(status != CL_SUCCESS || context == NULL) {

      printf("clCreateContext failed\n");

      //exit(-1);

   }

/////////////////////////////////////////////

// STEP 4: Create a Command Queue

/////////////////////////////////////////////

   cl_command_queue cmdQueue;

   // Create a command queue and associate it with the device you

   // want to execute on

   cmdQueue = clCreateCommandQueue(context, devices[0], 0, &status);

   if(status != CL_SUCCESS || cmdQueue == NULL) {

      printf("clCreateCommandQueue failed\n");

      //exit(-1);

   }

/////////////////////////////////////////////

// STEP 5: Create Device buffers

/////////////////////////////////////////////

   cl_mem d_A, d_B;  // Input buffers on device

   cl_mem d_C;       // Output buffer on device

   // Create a buffer object (d_A) that contains the data from the host ptr A

   d_A = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,

                   datasize, A, &status);

   if(status != CL_SUCCESS || d_A == NULL) {

      printf("clCreateBuffer failed\n");

      //exit(-1);

   }

   // Create a buffer object (d_B) that contains the data from the host ptr B

   d_B = clCreateBuffer(context, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,

                   datasize, B, &status);

   if(status != CL_SUCCESS || d_B == NULL) {

      printf("clCreateBuffer failed\n");

      //exit(-1);

   }

   // Create a buffer object (d_C) with enough space to hold the output data

   d_C = clCreateBuffer(context, CL_MEM_READ_WRITE,

                   datasize, NULL, &status);

   if(status != CL_SUCCESS || d_C == NULL) {

      printf("clCreateBuffer failed\n");

      //exit(-1);

   }

  

/////////////////////////////////////////////

// STEP 6: Create and compile the program

/////////////////////////////////////////////

   cl_program program;

  

   char *source;

   printf("start reading source file!\n");

   //const char *sourceFile = "vectoradd.cl";

   // This function reads in the source code of the program

   source = readSource(SourceFile);//File);

    printf("done! reading source file. \n");

   //printf("Program source is:\n%s\n", source);

   // Create a program. The 'source' string is the code from the

   // vectoradd.cl file.

   program = clCreateProgramWithSource(context, 1, (const char**)&source,//source,

                              NULL, &status);

   if(status != CL_SUCCESS) {

      printf("clCreateProgramWithSource failed\n");

      //exit(-1);

   }

printf("done! creating programe source file. \n");

   cl_int buildErr;

   // Build (compile & link) the program for the devices.

   // Save the return value in 'buildErr' (the following

   // code will print any compilation errors to the screen)

   buildErr = clBuildProgram(program, numDevices, devices, NULL, NULL, NULL);

printf("done! building source file. \n");

   // If there are build errors, print them to the screen

   if(buildErr != CL_SUCCESS) {

      printf("Program failed to build.\n");

      cl_build_status buildStatus;

      for(unsigned int i = 0; i < numDevices; i++) {

         clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_STATUS,

                          sizeof(cl_build_status), &buildStatus, NULL);

         if(buildStatus == CL_SUCCESS) {

            continue;

         }

         char *buildLog;

         size_t buildLogSize;

         clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_LOG,

                          0, NULL, &buildLogSize);

         buildLog = (char*)malloc(buildLogSize);

         if(buildLog == NULL) {

            perror("malloc");

            //exit(-1);

         }

         clGetProgramBuildInfo(program, devices, CL_PROGRAM_BUILD_LOG,

                          buildLogSize, buildLog, NULL);

         buildLog[buildLogSize-1] = '\0';

         printf("Device %u Build Log:\n%s\n", i, buildLog);  

         free(buildLog);

      }

      exit(0);

   }

   else {

      printf("No build errors\n");

   }

/////////////////////////////////////////////

// STEP 7: Create the kernel

/////////////////////////////////////////////

   cl_kernel kernel;

   // Create a kernel from the vector addition function (named "vecadd")

   kernel = clCreateKernel(program, "vecadd", &status);

   if(status != CL_SUCCESS) {

      printf("clCreateKernel failed\n");

      //exit(-1);

   }

/////////////////////////////////////////////

// STEP 8: Set the kernel arguments

/////////////////////////////////////////////

   // Associate the input and output buffers with the kernel

   status  = clSetKernelArg(kernel, 0, sizeof(cl_mem), &d_A);

   status |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &d_B);

   status |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &d_C);

   if(status != CL_SUCCESS) {

      printf("clSetKernelArg failed\n");

      //exit(-1);

   }

/////////////////////////////////////////////

// STEP 9: Configure the work-item structure

/////////////////////////////////////////////

   // Define an index space (global work size) of threads for execution. 

   // A workgroup size (local work size) is not required, but can be used.

   size_t globalWorkSize[1];  // There are ELEMENTS threads

   globalWorkSize[0] = ELEMENTS;

/////////////////////////////////////////////

// STEP 10: Enqueue the kernel for execution

/////////////////////////////////////////////

   // Execute the kernel.

   // 'globalWorkSize' is the 1D dimension of the work-items

   status = clEnqueueNDRangeKernel(cmdQueue, kernel, 1, NULL, globalWorkSize,

                           NULL, 0, NULL, NULL);

   if(status != CL_SUCCESS) {

      printf("clEnqueueNDRangeKernel failed\n");

      //exit(-1);

   }

/////////////////////////////////////////////

// STEP 11: Read the output buffer back to the host

/////////////////////////////////////////////

   // Read the OpenCL output buffer (d_C) to the host output array (C)

   clEnqueueReadBuffer(cmdQueue, d_C, CL_TRUE, 0, datasize, C,

                  0, NULL, NULL);

   // Verify correctness

   bool result = true;

   for(int i = 0; i < ELEMENTS; i++)

   {

       printf("%f ",C);

      if(C != i*i) {

         result = false;

         break;

      }

   }

   printf("\n");

   if(result) {

      printf("Output is correct\n");

   }

   else {

      printf("Output is incorrect\n");

   }

/////////////////////////////////////////////

// STEP 12:  Release OpenCL resources

/////////////////////////////////////////////

   clReleaseKernel(kernel);

   clReleaseProgram(program);

   clReleaseCommandQueue(cmdQueue);

   clReleaseMemObject(d_A);

   clReleaseMemObject(d_B);

   clReleaseMemObject(d_C);

   clReleaseContext(context);

   free(A);

   free(B);

   free(C);

   free(source);

   free(platforms);

   free(devices);

   getchar();

   return 0;

}

char* readSource(const char *sourceFilename) {

   FILE *fp;

   int err;

   int size;

   char *source;

   fp = fopen(sourceFilename, "rb");

   if(fp == NULL) {

      printf("Could not open kernel file: %s\n", sourceFilename);

      //exit(-1);

   }

  

   err = fseek(fp, 0, SEEK_END);

   if(err != 0) {

      printf("Error seeking to end of file\n");

      //exit(-1);

   }

   size = ftell(fp);

   if(size < 0) {

      printf("Error getting file position\n");

     // exit(-1);

   }

   err = fseek(fp, 0, SEEK_SET);

   if(err != 0) {

      printf("Error seeking to start of file\n");

      //exit(-1);

   }

   source = (char*)malloc(size+1);

   if(source == NULL) {

      printf("Error allocating %d bytes for the program source\n", size+1);

      //exit(-1);

   }

   err = fread(source, 1, size, fp);

   if(err != size) {

      printf("only read %d bytes\n", err);

     // exit(0);

   }

   source[size] = '\0';

    return source;

}

0 Likes

Thanks very much for all the help. I've actually got a working implementation now. Now i just need to try and understand it

and see how i can implement this into any existing code.

Thanks again.

0 Likes
kbrafford
Adept II

You should check out PyOpenCL.  It's a great way to get introduced to OpenCL without getting bogged down with gory details at first. 

--Keith Brafford

0 Likes
Jawed
Adept II

The OpenCL Spec and Reference Card  are worth having by your side:

http://www.khronos.org/registry/cl/

0 Likes