14 Replies Latest reply on Jun 7, 2012 5:08 AM by yurtesen

    I am a beginner in openCL programming, i was trying to implement some image processing algorithms. I have some error with enqueuing of kernel, it give me an error of invalid work item size ...can any one please help me...

    cyndwith

      This is my code...i have never used image objects before...i have tried reducing the image size it is to 256x256 and 128x128....

       

      When i go for clDeviceInfo() command...its gives an out put that it does not support image formats...??...am using AMD 6500 GPU...

       

      i will include the whole code if required..

        • Re: I am a beginner in openCL programming, i was trying to implement some image processing algorithms. I have some error with enqueuing of kernel, it give me an error of invalid work item size ...can any one please help me...
          yurtesen

          Have you checked:

          http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html

          CL_INVALID_WORK_ITEM_SIZE if the number of work-items specified in any of local_work_size[0], ...  local_work_size[work_dim - 1] is greater than the corresponding values specified by CL_DEVICE_MAX_WORK_ITEM_SIZES[0], .... CL_DEVICE_MAX_WORK_ITEM_SIZES[work_dim - 1].

          • Re: I am a beginner in openCL programming, i was trying to implement some image processing algorithms. I have some error with enqueuing of kernel, it give me an error of invalid work item size ...can any one please help me...
            cyndwith

            I have gone through the following links...then i programmed it to get the info of error....

             

            it gives CL_INVALID_WORK_ITEM_SIZE, i tried changing the global window (widht_image,height_image) and local window(16,16)...but nothing worked...the same error pops up again...

             

            i tried reducing the image size to 256x256 and then 128x128 even it dnt work out....:(

             

             

             

            // System includes

            #include <stdio.h>

            #include <stdlib.h>

            #include<windows.h>

            // OpenCL includes

            #include <CL/cl.h>

             

            const char *SourceFile = "image_con.txt";

            const char *sourceImage= "sourceImage.jpg";

            const char *outputImage= "outputImage.jpg";

             

             

            char* readSource(const char *sourceFilename);

            const int height=256;

            const int width=256;

            int main(int argc, char** argv)

            {

                 printf("Running Image Rotation program\n\n");

               

                

                 int filterWidth=3;

                int filterSize=filterWidth*filterWidth;//in case of square kernel

               

                int *filter;

                filter = (int*)malloc(filterSize);

             

               

            for(int i=0;i<filterWidth;i++)

            {

                for(int j=0;j<filterWidth;j++)

                {

                    filter[i*filterWidth+j]=0;

                }

            }

             

                 cl_int status;  // use as return value for most OpenCL functions

             

                cl_uint numPlatforms = 0;

                cl_platform_id *platforms;

               

             

             

             

             

             

            /////////////////////////////////////////////

            // STEP 1: Discover and initialize platforms

            /////////////////////////////////////////////

                        

               // Query for the number of recongnized platforms

               status = clGetPlatformIDs(0, NULL, &numPlatforms);

               if(status != CL_SUCCESS) {

                  printf("clGetPlatformIDs failed\n");

                  exit(-1);

               }

             

               // Make sure some platforms were found

               if(numPlatforms == 0) {

                  printf("No platforms detected.\n");

                  //exit(-1);

               }

             

               // Allocate enough space for each platform

               platforms = (cl_platform_id*)malloc(numPlatforms*sizeof(cl_platform_id));

               if(platforms == NULL) {

                  perror("malloc");

                  //exit(-1);

               }

             

               // Fill in platforms

               clGetPlatformIDs(numPlatforms, platforms, NULL);

               if(status != CL_SUCCESS) {

                  printf("clGetPlatformIDs failed\n");

                  //exit(-1);

               }

             

               // Print out some basic information about each platform

               printf("%u platforms detected\n", numPlatforms);

               for(unsigned int i = 0; i < numPlatforms; i++) {

                  char buf[100];

                  printf("Platform %u: \n", i);

                  status = clGetPlatformInfo(platforms[i], CL_PLATFORM_VENDOR,

                                   sizeof(buf), buf, NULL);

                  printf("\tVendor: %s\n", buf);

                  status |= clGetPlatformInfo(platforms[i], CL_PLATFORM_NAME,

                                   sizeof(buf), buf, NULL);

                  printf("\tName: %s\n", buf);

             

                  if(status != CL_SUCCESS) {

                     printf("clGetPlatformInfo failed\n");

                     //exit(-1);

                  }

               }

               printf("\n");

             

            /////////////////////////////////////////////

            // STEP 2: Discover and initialize devices

            /////////////////////////////////////////////

             

               cl_uint numDevices = 0;

               cl_device_id *devices;

             

               // Retrieve the number of devices present

               status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, 0, NULL,

                                       &numDevices);

               if(status != CL_SUCCESS) {

                  printf("clGetDeviceIDs failed\n");

                  //exit(-1);

               }

             

               // Make sure some devices were found

               if(numDevices == 0) {

                  printf("No devices detected.\n");

                  //exit(-1);

               }

             

               // Allocate enough space for each device

               devices = (cl_device_id*)malloc(numDevices*sizeof(cl_device_id));

               if(devices == NULL) {

                  perror("malloc");

                  //exit(-1);

               }

             

               // Fill in devices

               status = clGetDeviceIDs(platforms[0], CL_DEVICE_TYPE_GPU, numDevices,

                                 devices, NULL);

               if(status != CL_SUCCESS) {

                  printf("clGetDeviceIDs failed\n");

                  //exit(-1);

               }  

             

               // Print out some basic information about each device

               printf("%u devices detected\n", numDevices);

               for(unsigned int i = 0; i < numDevices; i++) {

                  char buf[100];

                  printf("Device %u: \n", i);

                  status = clGetDeviceInfo(devices[i], CL_DEVICE_VENDOR,

                                   sizeof(buf), buf, NULL);

                  printf("\tDevice: %s\n", buf);

                  status |= clGetDeviceInfo(devices[i], CL_DEVICE_NAME,

                                   sizeof(buf), buf, NULL);

                  printf("\tName: %s\n", buf);

             

                  if(status != CL_SUCCESS) {

                     printf("clGetDeviceInfo failed\n");

                     //exit(-1);

                  }

               }

               printf("\n");

             

               // START Execution Model

             

            /////////////////////////////////////////////

            // STEP 3: Create a Context

            /////////////////////////////////////////////

             

               cl_context context;

             

               // Create a context and associate it with the devices

               context = clCreateContext(NULL, numDevices, devices, NULL, NULL, &status);

               if(status != CL_SUCCESS || context == NULL) {

                  printf("clCreateContext failed\n");

                  //exit(-1);

               }

             

            /////////////////////////////////////////////

            // STEP 4: Create a Command Queue

            /////////////////////////////////////////////

             

               cl_command_queue cmdQueue;

             

               // Create a command queue and associate it with the device you

               // want to execute on

               cmdQueue = clCreateCommandQueue(context, devices[0], 0, &status);

               if(status != CL_SUCCESS || cmdQueue == NULL) {

                  printf("clCreateCommandQueue failed\n");

                  //exit(-1);

               }

             

             

            ////////////////////////////////////////////////////////////////

            // CONVOLUTION FILTER

            ///////////////////////////////////////////////////////////////

             

             

             

             

             

            // image format

             

             

            cl_image_format format;

             

            format.image_channel_order=CL_R;//single channel

            format.image_channel_data_type=CL_FLOAT;//float data type

             

            cl_mem bufferSourceImage=clCreateImage2D(context,0,&format,width,height,0,NULL,NULL);

             

             

            cl_mem bufferOutputImage=clCreateImage2D(context,0,&format,width,height,0,NULL,NULL);

             

            cl_mem bufferFilter= clCreateBuffer(context,0,filterSize*sizeof(float),NULL,NULL);

             

            ////////////////////////////////////////////////////////////////////////

            //// WRITE INPUT DATA

            ///////////////////////////////////////////////////////////////////////

             

            size_t origin[3]={0,0,0};//offset to take pixel value from image

            size_t region[3]={width,height,1};//elements to per dimension

             

            clEnqueueWriteImage(cmdQueue,bufferSourceImage,CL_FALSE,origin,region,0,0,sourceImage,0,NULL,NULL);

             

            clEnqueueWriteBuffer(cmdQueue,bufferFilter,CL_FALSE,0,filterSize*sizeof(float),filter,0,NULL,NULL);

            //////////////////////////////////////////////////////////////////////

            //    sample object HOW TO ACCESS AN IMAGE

            /////////////////////////////////////////////////////////////////////

             

            //cl_sampler clCreateSample(cl_context context,cl_bool normalized_coords,cl_addressing_mode addressing_mode,cl_filter_mode filter_mode,cl_int *errcode_re

             

            cl_sampler sampler = clCreateSampler(context,CL_FALSE,CL_ADDRESS_CLAMP_TO_EDGE,CL_FILTER_NEAREST,NULL);

             

             

             

            ////////////////////////////////////////////////////////////////////////

            /// COMPILE AND EXECUTE THE KERNEL

            ////////////////////////////////////////////////////////////////////////

            cl_program program;

              

               char *source;

               //const char *sourceFile = "vectoradd.cl";

               // This function reads in the source code of the program

               source = readSource(SourceFile);//File);

                printf("done! reading source file. \n");

              // printf("Program source is:\n%s\n", source);

             

               // Create a program. The 'source' string is the code from the

               // vectoradd.cl file.

               program = clCreateProgramWithSource(context, 1, (const char**)&source,//source,

                                          NULL, &status);

               if(status != CL_SUCCESS) {

                  printf("clCreateProgramWithSource failed\n");

                  //exit(-1);

               }

             

            printf("done! creating programe source file. \n");

               cl_int buildErr;

               // Build (compile & link) the program for the devices.

               // Save the return value in 'buildErr' (the following

               // code will print any compilation errors to the screen)

               buildErr = clBuildProgram(program,0,NULL,NULL,NULL,NULL);//numDevices, devices, NULL, NULL, NULL);

                printf("done! building source file. \n");

               // If there are build errors, print them to the screen

               if(buildErr != CL_SUCCESS)

               {

                  printf("Program failed to build.\n");

                  cl_build_status buildStatus;

                  for(unsigned int i = 0; i < numDevices; i++)

                  {

                     clGetProgramBuildInfo(program, devices[i], CL_PROGRAM_BUILD_STATUS,

                                      sizeof(cl_build_status), &buildStatus, NULL);

                     if(buildStatus == CL_SUCCESS)

                     {

                        continue;

                     }

             

                     char *buildLog;

                     size_t buildLogSize;

                     clGetProgramBuildInfo(program, devices[i], CL_PROGRAM_BUILD_LOG,

                                      0, NULL, &buildLogSize);

                     buildLog = (char*)malloc(buildLogSize);

                     if(buildLog == NULL)

                     {

                        perror("malloc");

                        //exit(-1);

                     }

                     clGetProgramBuildInfo(program, devices[i], CL_PROGRAM_BUILD_LOG,

                                      buildLogSize, buildLog, NULL);

                     buildLog[buildLogSize-1] = '\0';

                     printf("Device %u Build Log:\n%s\n", i, buildLog);  

                     free(buildLog);

                  }

                  //exit(0);

               }

               else

               {

                  printf("No build errors\n");

               }

            /////////////////////////////////////////////

            // STEP 7: Create the kernel

            /////////////////////////////////////////////

             

               cl_kernel kernel;

             

               // Create a kernel from the vector addition function (named "vecadd")

               kernel = clCreateKernel(program, "convolution", &status);

               if(status != CL_SUCCESS) {

                  printf("clCreateKernel failed\n");

                  //exit(-1);

               }

             

               printf(" Done creating Kernel!!\n");

             

            /////////////////////////////////////////////

            // STEP 8: Set the kernel arguments

            /////////////////////////////////////////////

             

               // Associate the input and output buffers with the kernel

               status  = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&bufferSourceImage);

               status |= clSetKernelArg(kernel, 1, sizeof(cl_mem), (void*)&bufferOutputImage);

               status |= clSetKernelArg(kernel, 2, sizeof(cl_int), (void*)&height);

               status  = clSetKernelArg(kernel, 3, sizeof(cl_int), (void*)&width);

               status |= clSetKernelArg(kernel, 4, sizeof(cl_mem), (void*)&filter);

               status |= clSetKernelArg(kernel, 5, sizeof(cl_int), (void*)&filterWidth);

               status  = clSetKernelArg(kernel, 6, sizeof(cl_mem), (void*)&sampler);

               if(status != CL_SUCCESS) {

                  printf("clSetKernelArg failed\n");

                  //exit(-1);

               }

             

               printf("Done setting kernel arguments!!\n");

             

            /////////////////////////////////////////////

            // STEP 9: Configure the work-item structure

            /////////////////////////////////////////////

             

               // Define an index space (global work size) of threads for execution. 

               // A workgroup size (local work size) is not required, but can be used.

               //size_t globalWorkSize[1];  // There are ELEMENTS threads

               //globalWorkSize[0] = 1;

             

               size_t localws[2]={16,16};

               size_t globalws[2]={width,height};

             

            /////////////////////////////////////////////

            // STEP 10: Enqueue the kernel for execution

            /////////////////////////////////////////////

             

               // Execute the kernel.

               // 'globalWorkSize' is the 1D dimension of the work-items

              // size_t globalWorkSize=height*width;

               //status = clEnqueueNDRangeKernel(cmdQueue, kernel, 1, NULL,&globalWorkSize,

                                    //  NULL, 0, NULL, NULL);

             

             

               status = clEnqueueNDRangeKernel(cmdQueue,kernel,2,NULL,globalws,localws,0,NULL,NULL);

               //status=clEnqueueNDRangeKernel(cmdQueue,kernel,2,NULL,globalWorkSize,NULL,0,NULL,NULL);

               if(status != CL_SUCCESS) {

                  printf("clEnqueueNDRangeKernel failed\n");

                  //exit(-1);

               }

                if(status=CL_INVALID_PROGRAM_EXECUTABLE)

                {

                printf("CL_INVALID_PROGRAM_EXECUTABLE\n");

                }

             

            if(status==CL_INVALID_COMMAND_QUEUE)

                {

                printf("CL_INVALID_COMMAND_QUEUE\n");

                }

             

            if(status==CL_INVALID_KERNEL)

                {

                printf("CL_INVALID_KERNEL\n");

                }

             

            if(status==CL_INVALID_CONTEXT)

                {

                printf("CL_INVALID_CONTEXT\n");

                }

             

            if(status==CL_INVALID_KERNEL_ARGS)

                {

                printf("CL_INVALID_KERNEL_ARGS\n");

                }

             

            if(status==CL_INVALID_WORK_DIMENSION)

                {

                printf("CL_INVALID_WORK_DIMENSION\n");

                }

             

            if(status==CL_INVALID_WORK_GROUP_SIZE)

                {

                printf("CL_INVALID_WORK_GROUP_SIZE\n");

                }

             

            if(status==CL_INVALID_WORK_ITEM_SIZE)

                {

                printf("CL_INVALID_WORK_ITEM_SIZE\n");

                }

            if(status==CL_INVALID_GLOBAL_OFFSET)

                {

                printf("CL_INVALID_GLOBAL_OFFSET");

                }

             

            if(status==CL_OUT_OF_RESOURCES)

                {

                printf("CL_OUT_OF_RESOURCES\n");

                }

             

            if(status==CL_MEM_OBJECT_ALLOCATION_FAILURE)

                {

                printf("CL_MEM_OBJECT_ALLOCATION_FAILURE \n");

                }

             

            /*if(status = CL_INVALID_EVEN_WAIT_LIST)

                {

                printf("CL_INVALID_EVEN_WAIT_LIST\n");

                }

            */

             

            if(status= CL_OUT_OF_HOST_MEMORY)

            {

                printf("CL_OUT_OF_HOST_MEMORY\n");

            }

             

             

             

             

               printf("done with enqueuing kernal !!\n");

             

             

            /////////////////////////////////////////////////////////////////////////////

            ////////////// READ THE RESULT

            //////////////////////////////////////////////////////////////////////////////

             

            clEnqueueReadImage(cmdQueue,bufferOutputImage,CL_TRUE,origin,region,0,0,(void*)outputImage,0,NULL,NULL);

             

            ///////////////////////////////////////////////////////////////////////

            ///

             

             

            ////////////////////////////////////////////////////////////

            ////////////////// RELEASE OPENCL RESORCES

            ////////////////////////////////////////////////////////////

            // STEP 12:  Release OpenCL resources

            /////////////////////////////////////////////

             

               clReleaseKernel(kernel);

               clReleaseProgram(program);

               clReleaseCommandQueue(cmdQueue);

               clReleaseMemObject(bufferSourceImage);

               clReleaseMemObject(bufferOutputImage);

               //clReleaseMemObject(d_c);

               clReleaseContext(context);

             

               //free(height);

               //free(width);

               //free(C);

               free(source);

               free(platforms);

               free(devices);

             

             

                getchar();

                return 0;

            }

             

             

            char* readSource(const char *sourceFilename) {

             

               FILE *fp;

               int err;

               int size;

             

               char *source;

             

               fp = fopen(sourceFilename,"rb");

               if(fp == NULL) {

                  printf("Could not open kernel file: %s\n", sourceFilename);

                  //exit(-1);

               }

              

               err = fseek(fp, 0, SEEK_END);

               if(err != 0) {

                  printf("Error seeking to end of file\n");

                  //exit(-1);

               }

             

               size = ftell(fp);

               if(size < 0) {

                  printf("Error getting file position\n");

                 // exit(-1);

               }

             

               err = fseek(fp, 0, SEEK_SET);

               if(err != 0) {

                  printf("Error seeking to start of file\n");

                  //exit(-1);

               }

             

               source = (char*)malloc(size+1);

               if(source == NULL) {

                  printf("Error allocating %d bytes for the program source\n", size+1);

                  //exit(-1);

               }

             

               err = fread(source, 1, size, fp);

               if(err != size) {

                  printf("only read %d bytes\n", err);

                 // exit(0);

               }

             

               source[size] = '\0';

             

                return source;

            }

            • Re: I am a beginner in openCL programming, i was trying to implement some image processing algorithms. I have some error with enqueuing of kernel, it give me an error of invalid work item size ...can any one please help me...
              cyndwith

              And when i check with the device info....function...it says that the device does not support image data?? why is that so?

               

              is the error because of  that?...

              • Re: I am a beginner in openCL programming, i was trying to implement some image processing algorithms. I have some error with enqueuing of kernel, it give me an error of invalid work item size ...can any one please help me...
                cyndwith

                clinfo2.jpg

                 

                 

                THANKS A LOT FOR YOUR HELP..:)

                 

                I SEE THAT MOST OF THE APPLICATIONS IN IMAGE PROCESSING USING OPENCL ARE BUILT ON C# OR C++ WRAPPERS...CAN YOU SUGGEST ME A GOOD TUTORIAL TO LEARN OPENCL C++....

                 

                the above code i have written referng to many websites..and books...i cant figure out such error in any other forums...

                • Re: I am a beginner in openCL programming, i was trying to implement some image processing algorithms. I have some error with enqueuing of kernel, it give me an error of invalid work item size ...can any one please help me...
                  cyndwith

                  Unhandled exception at 0x52e9290d in image_processing.exe: 0xC0000005: Access violation reading location 0x00000004.

                   

                   

                  this is the error i get...every time i try to compile it....

                    • Re: I am a beginner in openCL programming, i was trying to implement some image processing algorithms. I have some error with enqueuing of kernel, it give me an error of invalid work item size ...can any one please help me...
                      Meteorhead

                      Do I take it correctly, that you get access violation when copmiling your kernel?? While that seems very unlikely (but can be true), I assume even if it happens at seemingly compile time your host code overindexes something.

                       

                      Try running you application in debug mode and see where the program crashed, and what indices overrun and try to trace back why it happened.

                       

                      Really no offense, but please don't write novels into topic titles, because it is really frustrating to see posts in the topic list that take up 4X the space they should.

                       

                      For the tutorial, this seems like a good one:

                      http://enja.org/2010/07/13/adventures-in-opencl-part-1-getting-started/index.html

                      (this I got from a simple google search, not much time invested)

                       

                      As for the learning, I have always advised fiddling around with already working apps first (AMD APP SDK samples are just fine), modifying a few values around, not trying to add new features at first. In the meantime, start reading the OpenCL specs:

                      http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf

                      Yes, it is a hell of a big document, but believe me that everyone who knows OpenCL properly have read it, or at least 80% of it. One need not learn it by heart, the important things from this is, that you know what the API is capable of, and when you write a program and have a feature missing, you know where to look if you want to implement it.

                       

                      The C++ API documentation is very scarce, it builds upon your knowledge of the underlying C API, so if you are not very confident with learning something new like this (eg. you don't have prior CUDA, DX, OpenGL experience), I would suggest starting with the C API, as that is a lot more well documented, plus there are a lot more samples. It's true that getting access violations are easier to get in C, but acquiring the confidence in using the C API will reward itself.

                       

                      There are some archived topics on this forum about getting started in OpenCL, what documents to start reading, so this I already explained once, but I thought I'd write it again (in case someone else might come by just starting with OpenCL).

                       

                      Cheers,

                           Máté

                    • Re: I am a beginner in openCL programming, i was trying to implement some image processing algorithms. I have some error with enqueuing of kernel, it give me an error of invalid work item size ...can any one please help me...
                      cyndwith

                      Thanks a lot for you reference...

                       

                      I have tried to include break points... i observed that, the error pops up at the KERNEL ENQUEUE COMMAND....

                      when i tried the error routine...it gives CL_INVALID_WORK_ITEM_SIZE. I have choosen images of resolution 256x256( later changed it to 128x128...but its still the same. I observe that the above error occurs if work_item_size exceeds the maximum limit...but according to device info...the max_work_item_size is 1024...

                       

                       

                      Sorry i may sound silly...but i have been trying to understand it...i was to code some simple codes involving buffer objects manipulation..but i cannt find a good example in OpenCL C giving explanation on implementation of algorithms on image objects...