Archives Discussions

alexaverbuch · ‎09-22-2009

Hi,

I'm getting an error while trying to build my OpenCL program, and would be grateful if someone could provide any suggestions.

Please see attached code for an extract of the offending code.

When calling clBuildProgram I get a CL_INVALID_OPERATION error. In the OpenCL spec (version 1.0.43) this error is described as:

"if the build of a program executable for any of the devices listed in device_list by a previous call to clBuildProgram for program has not completed __OR__ if there are kernel objects attached to program"

This is the first call to clBuildProgram in this execution, as I only call it once. Could the error be caused by a previous execution?!

I only have 1 Kernel object and have not created it yet at this point, so I don't see how it could be attached to the program.

Thanks in advance for any help.

Alex

commandQueue = clCreateCommandQueue( context, devices[0], 0, &status); const char * filename = "EdgeDetect_Kernels.cl"; std::string sourceStr = convertToString(filename); const char * source = sourceStr.c_str(); size_t sourceSize[] = { strlen(source) }; program = clCreateProgramWithSource( context, 1, &source, sourceSize, &status); status = clBuildProgram(program, 1, devices, NULL, NULL, NULL); // <------- BOOM!!

omkaranathan · ‎09-22-2009

You should check build log in case of error in clBuildProgram, it will give more specific error details.

alexaverbuch · ‎09-22-2009

Ok thanks, where can I find the build log?

omkaranathan · ‎09-22-2009

Here is a sample code to get the build log

/* create a cl program executable for all the devices specified */ status = clBuildProgram(program, 1, devices, NULL, NULL, NULL); //error checking code if(!sampleCommon->checkVal(status,CL_SUCCESS,"clBuildProgram failed.")) { //print kernel compilation error char programLog[1024]; status = clGetProgramBuildInfo(program, devices[0], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); std::cout<<programLog<<std::endl; return 0; }

alexaverbuch · ‎09-22-2009

Thanks again!

Ok, so apparently this is the error:

/tmp/OCL0K1Otq.cl(6): error: kernel must return void; pointer argument must
point to addrSpace __global, __local, or __constant

__kernel void edgeDetectKernel( __global uint ** output,

Is this occuring because I'm trying to pass in a 2-dim array instead of a 1-dim?

I see that all the example kernels have 1-dim arrays... if this is a requirement I can convert my code, but would rather not go through that process if the problem is something else

__kernel void edgeDetectKernel( __global uint ** output,

alexaverbuch · ‎09-22-2009

According to the "ATI_Stream_SDK_Release_Notes.pdf" document:

"It is not possible to query the compilation log generated by clBuildProgram(). An empty string is returned. Compiler errors are printed in the console on stderr."

And, unfortunately, I don't get any console printouts when clBuildProgram goes BOOM either... so I have no idea what is going wrong, beyond the returned "status" and what I can make from debugging my code (which is not much...)

Is there a way to enable a console stderr printout during clBuildProgram?

Thanks,

Alex

omkaranathan · ‎09-22-2009

Did you try adding the above code to your program?

alexaverbuch · ‎09-22-2009

Originally posted by: omkaranathan Did you try adding the above code to your program?

Yeah, sorry mate, we posted at the same time as each other so I didn't see your comments until I'd pulled the trigger.

What do you think of my previous comment (with build log)?

omkaranathan · ‎09-22-2009

Could you post the source code(host &kernel code)?.

alexaverbuch · ‎09-22-2009

kernel code

__kernel void edgeDetectKernel( __global uint ** output, __global uint4 ** input, __global uint ** clSobelOpX, __global uint ** clSobelOpY, const uint2 inputDimensions, const uint2 sobelDimensions) { uint tid = get_global_id(0); uint x = tid % inputDimensions.x; uint y = tid / inputDimensions.y; output = (input.x + input.y + input.z) * 3; }

alexaverbuch · ‎09-22-2009

host code

#include "EdgeDetect.hpp" ///////////////////////////////////////////////////////////////// // Util Methods ///////////////////////////////////////////////////////////////// void cvDisplay(IplImage* image, char windowName[], int x, int y) { CvSize imageSize = cvGetSize(image); cvNamedWindow(windowName); cvResizeWindow(windowName, imageSize.width, imageSize.height); cvMoveWindow(windowName, x, y); cvShowImage(windowName,image); } //converts raw image into intensity values cl_uint ***cvImageToClArray(IplImage* raw) { int width = raw->width; int height = raw->height; cl_uint ***imageArray = (cl_uint***)malloc(width * sizeof(cl_uint**)); for (int i = 0 ; i < width; i++) { imageArray = (cl_uint**)malloc(height * sizeof(cl_uint*)); for (int j = 0 ; j < height; j++) { imageArray = (cl_uint*)malloc(4 * sizeof(cl_uint)); } } //generate intensity image for (int y=0; y<raw->height; y++) for (int x=0; x<raw->width; x++) { CvScalar colourValue = cvGet2D(raw,y,x); imageArray[0] = colourValue.val[0]; //B imageArray[1] = colourValue.val[1]; //G imageArray[2] = colourValue.val[2]; //R imageArray[3] = 0; //A } return imageArray; } //converts raw image into intensity values IplImage* clArryToCvImage(cl_uint** output, int resultWidth, int resultHeight) { CvSize size; size.width = resultWidth; size.height = resultHeight; IplImage* resultImg = cvCreateImage(size,IPL_DEPTH_8U,1); //generate intensity image for (int y=0; y<resultHeight; y++) for (int x=0; x<resultWidth; x++) { CvScalar colourSelect; colourSelect.val[0] = output; cvSet2D(resultImg,y,x,colourSelect); } return resultImg; } // Converts the contents of a file into a string std::string convertToString(const char *filename) { size_t size; char* str; std::string s; std::fstream f(filename, (std::fstream::in | std::fstream::binary)); if(f.is_open()) { size_t fileSize; f.seekg(0, std::fstream::end); size = fileSize = f.tellg(); f.seekg(0, std::fstream::beg); str = new char[size+1]; if(!str) { f.close(); return NULL; } f.read(str, fileSize); f.close(); str[size] = '\0'; s = str; return s; } return NULL; } ///////////////////////////////////////////////////////////////// // Serial (OpenCV) Methods ///////////////////////////////////////////////////////////////// int cvDoFindEdges(IplImage* cvImg) { //gray scale representation of raw image IplImage* cvImgIntensity = cvGenerateIntensityImage(cvImg); //resultant image after Sobel operator applied to raw image IplImage* cvImgSobel = cvGenerateSobelImage(cvImgIntensity); cvDisplay(cvImg,"raw",0,0); cvDisplay(cvImgIntensity,"intensity",0,0); cvDisplay(cvImgSobel,"imgSobel",0,0); cvWaitKey(0); cvDestroyAllWindows(); cvReleaseImage(&cvImgSobel); } //converts raw image into intensity values IplImage* cvGenerateIntensityImage(IplImage* raw) { IplImage* intensityImage = cvCreateImage(cvGetSize(raw),IPL_DEPTH_8U,1); //generate intensity image for (int y=0; y<raw->height; y++) for (int x=0; x<raw->width; x++) { CvScalar colourValue = cvGet2D(raw,y,x); CvScalar colourSelect; colourSelect.val[0] = (colourValue.val[0]+colourValue.val[1]+colourValue.val[2])/3; cvSet2D(intensityImage,y,x,colourSelect); } return intensityImage; } //applies Sobel Operator IplImage* cvGenerateSobelImage(IplImage* intensityImage) { IplImage* sobelImg = cvCreateImage(cvGetSize(intensityImage),IPL_DEPTH_8U, 1); //matrix representation of Sobel image. to ensure negative values are not stored as 0 CvMat* sobelMat = cvCreateMat(intensityImage->height,intensityImage->width,CV_64FC1); //generate sobel image for (int y=0; y<intensityImage->height; y++) for (int x=0; x<intensityImage->width; x++) { double Gx; double Gy; double G; if ((y==0) || (y==intensityImage->height-1) || (x==0) || (x==intensityImage->width-1)) { G = cvGet2D(intensityImage,y,x).val[0]; } else { Gx = cvGet2D(intensityImage,y-1,x-1).val[0] * cvSobelOpX[0][0] + cvGet2D(intensityImage,y-1,x).val[0] * cvSobelOpX[0][1] + cvGet2D(intensityImage,y-1,x+1).val[0] * cvSobelOpX[0][2] + cvGet2D(intensityImage,y,x-1).val[0] * cvSobelOpX[1][0] + cvGet2D(intensityImage,y,x).val[0] * cvSobelOpX[1][1] + cvGet2D(intensityImage,y,x+1).val[0] * cvSobelOpX[1][2] + cvGet2D(intensityImage,y+1,x-1).val[0] * cvSobelOpX[2][0] + cvGet2D(intensityImage,y+1,x).val[0] * cvSobelOpX[2][1] + cvGet2D(intensityImage,y+1,x+1).val[0] * cvSobelOpX[2][2]; Gx = abs(Gx); Gy = cvGet2D(intensityImage,y-1,x-1).val[0] * cvSobelOpY[0][0] + cvGet2D(intensityImage,y-1,x).val[0] * cvSobelOpY[0][1] + cvGet2D(intensityImage,y-1,x+1).val[0] * cvSobelOpY[0][2] + cvGet2D(intensityImage,y,x-1).val[0] * cvSobelOpY[1][0] + cvGet2D(intensityImage,y,x).val[0] * cvSobelOpY[1][1] + cvGet2D(intensityImage,y,x+1).val[0] * cvSobelOpY[1][2] + cvGet2D(intensityImage,y+1,x-1).val[0] * cvSobelOpY[2][0] + cvGet2D(intensityImage,y+1,x).val[0] * cvSobelOpY[2][1] + cvGet2D(intensityImage,y+1,x+1).val[0] * cvSobelOpY[2][2]; Gy = abs(Gy); G = Gx + Gy; } CvScalar colourSelect; colourSelect.val[0] = G; cvmSet(sobelMat,y,x,G); cvSet2D(sobelImg,y,x,colourSelect); } return sobelImg; } ///////////////////////////////////////////////////////////////// // Parallel (OpenCL) Methods ///////////////////////////////////////////////////////////////// //todo: complete this //cl_mem clDoCreateImage( // char* filename, // cl_context context) //{ // //todo: temp test code // IplImage* tempRaw = cvLoadImage(filename, 1); // size_t width = (size_t)(tempRaw->width); // size_t height = (size_t)(tempRaw->height); // //todo: maybe let OpenCL deal with this (it will be width*bytes-per-pixel // size_t rowpitch = 0; //width*4; // // void* image = fopen(filename,"rb"); // if (image != NULL) { // std::cout<<"image loaded successfully: " << filename << "\n"; // } else { // std::cout<<"image could not be loaded: " << filename << "\n"; // } // // // // set the image format properties and option flags // cl_image_format format; // format.image_channel_order = CL_RGBA; // format.image_channel_data_type = CL_UNORM_INT8; // //format.image_channel_data_type = CL_UNSIGNED_INT8; // // cl_mem_flags flags; //// flags = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR; // flags = CL_MEM_READ_ONLY; // // cl_int error = CL_SUCCESS; // // std::cout<<"BEFORE\n"; // // cl_mem myClImage = clCreateImage2D( // context, // a valid OpenCL context // flags, // option flags [1] // &format, // image format properties [2] // width, // width of the image in pixels // height, // height of the image in pixels // rowpitch, // scan-line pitch in bytes [3] // image, // pointer to the image data // &error // on return, the result code // ); // // std::cout<<"AFTER\n"; // // if(image == 0 || error != CL_SUCCESS) // { // std::cout<<"Error: Could not create 2D image (clCreateImage2D)\n"; // } // return myClImage; //} // Host Initialization: Allocate & init memory on the host. Print input array. void clInitializeHost(IplImage* cvRawImg) { input = NULL; output = NULL; input = cvImageToClArray(cvRawImg); if(input==NULL) { std::cout<<"Error: Failed to allocate 'input' host memory. (input)\n"; return; } output = (cl_uint**)malloc(width * sizeof(cl_uint*)); for (int i = 0 ; i < width; i++) output = (cl_uint*)malloc(height * sizeof(cl_uint)); if(output==NULL) { std::cout<<"Error: Failed to allocate 'output' host memory. (input)\n"; return; } } // OpenCL related initialization // -> Create Context, Device list, Command Queue // -> Create OpenCL memory buffer objects // -> Load CL file, compile, link CL source // -> Build program and kernel objects void clInitialize(void) { cl_int status = 0; size_t deviceListSize; ///////////////////////////////////////////////////////////////// // Create an OpenCL context ///////////////////////////////////////////////////////////////// //todo: experiment with CL_DEVICE_TYPE_ ALL, DEFAULT, GPU, ACCELERATOR, CPU context = clCreateContextFromType(0, CL_DEVICE_TYPE_CPU, NULL, NULL, &status); if(status != CL_SUCCESS) { std::cout<<"Error: Creating Context. (clCreateContextFromType)\n"; return; } /* First, get the size of device list data */ status = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &deviceListSize); if(status != CL_SUCCESS) { std::cout<< "Error: Getting Context Info (device list size, clGetContextInfo)\n"; return; } ///////////////////////////////////////////////////////////////// // Detect OpenCL devices ///////////////////////////////////////////////////////////////// devices = (cl_device_id *)malloc(deviceListSize); if(devices == 0) { std::cout<<"Error: No devices found.\n"; return; } /* Now, get the device list data */ status = clGetContextInfo( context, CL_CONTEXT_DEVICES, deviceListSize, devices, NULL); if(status != CL_SUCCESS) { std::cout<<"Error: Getting Context Info (device list, clGetContextInfo)\n"; return; } ///////////////////////////////////////////////////////////////// // Create an OpenCL command queue ///////////////////////////////////////////////////////////////// /* The block is to move the declaration of prop closer to its use */ //todo: set this to 0 later maybe cl_command_queue_properties prop = 0; if (PROFILE) { prop |= CL_QUEUE_PROFILING_ENABLE; } commandQueue = clCreateCommandQueue( context, devices[0], prop, &status); if(status != CL_SUCCESS) { std::cout<<"Creating Command Queue. (clCreateCommandQueue)\n"; return; } ///////////////////////////////////////////////////////////////// // Create OpenCL memory buffers ///////////////////////////////////////////////////////////////// inputBuffer = clCreateBuffer( context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * 4 * width * height, input, &status); if(status == CL_INVALID_CONTEXT) { //context not valid std::cout<<"Error: clCreateBuffer - invalid context - (inputBuffer)\n"; return; } if(status == CL_INVALID_VALUE) { //flags value not valid std::cout<<"Error: clCreateBuffer - invalid flags value - (inputBuffer)\n"; return; } if(status == CL_INVALID_BUFFER_SIZE) { //size==0 or size>CL_DEVICE_MAX_MEM_ALLOC_SIZE std::cout<<"Error: clCreateBuffer - invalid buffer size - (inputBuffer)\n"; return; } if(status == CL_INVALID_HOST_PTR) { //(host_ptr == NULL) && (CL_MEM_USE_HOST_PTR || CL_MEM_COPY_HOST_PTR in flags) //|| //(host_ptr != NULL) && (CL_MEM_COPY_HOST_PTR || CL_MEM_USE_HOST_PTR _not_ in flags) bool isNull = (input==NULL); std::cout<<"Error: clCreateBuffer - invalid host pointer - (inputBuffer) - NULL==" << isNull << "\n"; return; } if(status == CL_MEM_OBJECT_ALLOCATION_FAILURE) { //there is a failure to allocate memory for buffer object std::cout<<"Error: clCreateBuffer - mem object alloc failure - (inputBuffer)\n"; return; } if(status == CL_OUT_OF_HOST_MEMORY) { //there is a failure to allocate resources required by the OpenCL implementation on the host std::cout<<"Error: clCreateBuffer - out of host mem - (inputBuffer)\n"; return; } if(status != CL_SUCCESS) { std::cout<<"Error: clCreateBuffer (inputBuffer)\n"; return; } outputBuffer = clCreateBuffer( context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint) * width * height, output, &status); if(status != CL_SUCCESS) { std::cout<<"Error: clCreateBuffer (outputBuffer)\n"; return; } sobelOpXBuffer = clCreateBuffer( context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint ) * maskWidth * maskHeight, clSobelOpX, &status); if(status != CL_SUCCESS) { std::cout<<"Error: clCreateBuffer (sobelOpXBuffer)\n"; return; } sobelOpYBuffer = clCreateBuffer( context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uint ) * maskWidth * maskHeight, clSobelOpY, &status); if(status != CL_SUCCESS) { std::cout<<"Error: clCreateBuffer (sobelOpYBuffer)\n"; return; } ///////////////////////////////////////////////////////////////// // Load CL file, build CL program object, create CL kernel object ///////////////////////////////////////////////////////////////// const char * filename = "EdgeDetect_Kernels.cl"; std::string sourceStr = convertToString(filename); const char * source = sourceStr.c_str(); size_t sourceSize[] = { strlen(source) }; std::cout << source << "\n"; program = clCreateProgramWithSource( context, 1, &source, sourceSize, &status); if(status != CL_SUCCESS) { std::cout<<"Error: Loading Binary into cl_program (clCreateProgramWithSource)\n"; return; } /* create a cl program executable for all the devices specified */ status = clBuildProgram(program, 1, devices, NULL, NULL, NULL); //error checking code streamsdk::SDKCommon sampleCommon; if(!sampleCommon.checkVal(status,CL_SUCCESS,"clBuildProgram failed.")) { //print kernel compilation error char programLog[1024]; status = clGetProgramBuildInfo(program, devices[0], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); std::cout<<programLog<<std::endl; return; } if(status == CL_INVALID_PROGRAM) { //if program is not a valid program object. std::cout<<"Error: Invalid program object. (clBuildProgram)\n"; return; } if(status == CL_INVALID_VALUE) { // (device_list == NULL) && (num_devices > 0) // || // (device_list != NULL) && (num_devices ==0) // || // (pfn_notify == NULL) && (user_data != NULL) std::cout<<"Error: Invalid value - device_list==NULL:" << (devices==NULL) << " - (clBuildProgram)\n"; return; } if(status == CL_INVALID_DEVICE) { // OpenCL devices listed in device_list are not in the list of // devices associated with program. std::cout<<"Error: Invalid device. (clBuildProgram)\n"; return; } if(status == CL_INVALID_BINARY) { // if program is created with clCreateWithProgramBinary and // devices listed in device_list do not have a valid program binary loaded. std::cout<<"Error: Invalid binary. (clBuildProgram)\n"; return; } if(status == CL_INVALID_BUILD_OPTIONS) { // if the build options specified by options are invalid std::cout<<"Error: Invalid build options. (clBuildProgram)\n"; return; } if(status == CL_INVALID_OPERATION) { // if the build of a program executable for any of the devices // listed in device_list by a previous call to clBuildProgram for program has not // completed // || // if there are kernel objects attached to program. std::cout<<"Error: Invalid operation. (clBuildProgram)\n"; return; } if(status == CL_COMPILER_NOT_AVAILABLE) { // CL_COMPILER_NOT_AVAILABLE if program is created with // clCreateProgramWithSource and a compiler is not available i.e. // CL_DEVICE_COMPILER_AVAILABLE specified in table 4.3 is set to CL_FALSE. std::cout<<"Error: Compiler not available. (clBuildProgram)\n"; return; } if(status == CL_BUILD_PROGRAM_FAILURE) { // if there is a failure to build the program executable. // This error will be returned if clBuildProgram does not return until the build has // completed. std::cout<<"Error: Build program failure. (clBuildProgram)\n"; return; } if(status == CL_OUT_OF_HOST_MEMORY) { // if there is a failure to allocate resources required by the // OpenCL implementation on the host. std::cout<<"Error: Out of host memory. (clBuildProgram)\n"; return; } if(status != CL_SUCCESS) { std::cout<<"Error: Building Program (clBuildProgram)\n"; return; } /* get a kernel object handle for a kernel with the given name */ kernel = clCreateKernel(program, "edgeDetectKernel", &status); if(status != CL_SUCCESS) { std::cout<<"Error: Creating Kernel from program. (clCreateKernel)\n"; return; } } // Run OpenCL program // -> Bind host variables to kernel arguments // -> Run the CL kernel void clRunKernels(void) { cl_int status; cl_event events[2]; size_t globalThreads[1]; size_t localThreads[1]; globalThreads[0] = width*height; localThreads[0] = 1; ////////////////////////////////////////// // Set appropriate arguments to the kernel ////////////////////////////////////////// /* the output array to the kernel */ status = clSetKernelArg( kernel, 0, sizeof(cl_mem), (void *)&outputBuffer); if(status != CL_SUCCESS) { std::cout<<"Error: Setting kernel argument. (output)\n"; return; } /* the input array to the kernel */ status = clSetKernelArg( kernel, 1, sizeof(cl_mem), (void *)&inputBuffer); if(status != CL_SUCCESS) { std::cout<<"Error: Setting kernel argument. (input)\n"; return; } status = clSetKernelArg( kernel, 2, sizeof(cl_mem), (void *)&sobelOpXBuffer); if(status != CL_SUCCESS) { std::cout<<"Error: Setting kernel argument. (sobelx)\n"; return; } status = clSetKernelArg( kernel, 3, sizeof(cl_mem), (void *)&sobelOpYBuffer); if(status != CL_SUCCESS) { std::cout<<"Error: Setting kernel argument. (sobely)\n"; return; } cl_uint2 inputDimensions = {width, height}; status = clSetKernelArg( kernel, 3, sizeof(cl_uint2), (void *)&inputDimensions); if(status != CL_SUCCESS) { std::cout<<"Error: Setting kernel argument. (inputDimensions)\n"; return; } cl_uint2 sobelDimensions = {maskWidth, maskHeight}; status = clSetKernelArg( kernel, 4, sizeof(cl_uint2), (void *)&sobelDimensions); if(status != CL_SUCCESS) { std::cout<<"Error: Setting kernel argument. (sobelDimensions)\n"; return; } ////////////////////////////////////////// // Enqueue a kernel run call. ////////////////////////////////////////// status = clEnqueueNDRangeKernel( commandQueue, kernel, 1, NULL, globalThreads, localThreads, 0, NULL, &events[0]); if(status != CL_SUCCESS) { std::cout<<"Error: Enqueueing kernel onto command queue. (clEnqueueNDRangeKernel)\n"; return; } ////////////////////////////////////////// // wait for the kernel call to finish execution ////////////////////////////////////////// status = clWaitForEvents(1, &events[0]); if(status != CL_SUCCESS) { std::cout<<"Error: Waiting for kernel run to finish. (clWaitForEvents 0)\n"; return; } if (PROFILE) { long long kernelsStartTime; long long kernelsEndTime; status = clGetEventProfilingInfo( events[0], CL_PROFILING_COMMAND_START, sizeof(long long), &kernelsStartTime, NULL); if(status != CL_SUCCESS) { std::cout<<"Error: clGetEventProfilingInfo failed (start)\n"; return; } status = clGetEventProfilingInfo( events[0], CL_PROFILING_COMMAND_END, sizeof(long long), &kernelsEndTime, NULL); if(status != CL_SUCCESS) { std::cout<<"Error: clGetEventProfilingInfo failed (end)\n"; return; } /* Compute total time (also convert from nanoseconds to seconds) */ long long totalTime = (double)(kernelsEndTime - kernelsStartTime)/1e9; std::cout<<"TIME: " << totalTime << "\n"; } clReleaseEvent(events[0]); ////////////////////////////////////////// // Enqueue readBuffer ////////////////////////////////////////// status = clEnqueueReadBuffer( commandQueue, outputBuffer, CL_TRUE, 0, width * height * sizeof(cl_uint), output, 0, NULL, &events[1]); if(status != CL_SUCCESS) { std::cout <<"Error: clEnqueueReadBuffer failed. (clEnqueueReadBuffer)\n"; } ////////////////////////////////////////// // Wait for the read buffer to finish execution ////////////////////////////////////////// status = clWaitForEvents(1, &events[1]); if(status != CL_SUCCESS) { std::cout<<"Error: Waiting for read buffer call to finish. (clWaitForEvents)\n"; return; } clReleaseEvent(events[1]); } // Release OpenCL resources (Context, Memory etc.) void clCleanup(void) { cl_int status; status = clReleaseKernel(kernel); if(status != CL_SUCCESS) { std::cout<<"Error: In clReleaseKernel \n"; return; } status = clReleaseProgram(program); if(status != CL_SUCCESS) { std::cout<<"Error: In clReleaseProgram\n"; return; } status = clReleaseMemObject(inputBuffer); if(status != CL_SUCCESS) { std::cout<<"Error: In clReleaseMemObject (inputBuffer)\n"; return; } status = clReleaseMemObject(outputBuffer); if(status != CL_SUCCESS) { std::cout<<"Error: In clReleaseMemObject (outputBuffer)\n"; return; } status = clReleaseMemObject(sobelOpXBuffer); if(status != CL_SUCCESS) { std::cout<<"Error: In clReleaseMemObject (sobelOpXBuffer)\n"; return; } status = clReleaseMemObject(sobelOpYBuffer); if(status != CL_SUCCESS) { std::cout<<"Error: In clReleaseMemObject (sobelOpYBuffer)\n"; return; } status = clReleaseCommandQueue(commandQueue); if(status != CL_SUCCESS) { std::cout<<"Error: In clReleaseCommandQueue\n"; return; } status = clReleaseContext(context); if(status != CL_SUCCESS) { std::cout<<"Error: In clReleaseContext\n"; return; } } // Releases program's resources void clCleanupHost(void) { if(input != NULL) { free(input); input = NULL; } if(output != NULL) { free(output); output = NULL; } if(clSobelOpX != NULL) { free(output); output = NULL; } if(clSobelOpY != NULL) { free(output); output = NULL; } if(devices != NULL) { free(devices); devices = NULL; } } /*Display OpenCL system info */ void clPrintInfo() { int MAX_DEVICES = 10; size_t p_size; size_t arr_tsize[3]; size_t ret_size; char param[100]; cl_uint entries; cl_ulong long_entries; cl_bool bool_entries; cl_device_id devices[MAX_DEVICES]; size_t num_devices; cl_device_local_mem_type mem_type; cl_device_type dev_type; cl_device_fp_config fp_conf; cl_device_exec_capabilities exec_cap; clGetDeviceIDs( NULL, CL_DEVICE_TYPE_DEFAULT, MAX_DEVICES, devices, &num_devices); printf("Found Devices:\t\t%d\n", num_devices); for (int i = 0; i < num_devices; i++) { printf("\nDevice: %d\n\n", i); clGetDeviceInfo(devices, CL_DEVICE_TYPE, sizeof(dev_type), &dev_type, &ret_size); printf("\tDevice Type:\t\t"); if (dev_type & CL_DEVICE_TYPE_GPU) printf("CL_DEVICE_TYPE_GPU "); if (dev_type & CL_DEVICE_TYPE_CPU) printf("CL_DEVICE_TYPE_CPU "); if (dev_type & CL_DEVICE_TYPE_ACCELERATOR) printf("CL_DEVICE_TYPE_ACCELERATOR "); if (dev_type & CL_DEVICE_TYPE_DEFAULT) printf("CL_DEVICE_TYPE_DEFAULT "); printf("\n"); clGetDeviceInfo(devices, CL_DEVICE_NAME, sizeof(param), param, &ret_size); printf("\tName: \t\t\t%s\n", param); clGetDeviceInfo(devices, CL_DEVICE_VENDOR, sizeof(param), param, &ret_size); printf("\tVendor: \t\t%s\n", param); clGetDeviceInfo(devices, CL_DEVICE_VENDOR_ID, sizeof(cl_uint), &entries, &ret_size); printf("\tVendor ID:\t\t%d\n", entries); clGetDeviceInfo(devices, CL_DEVICE_VERSION, sizeof(param), param, &ret_size); printf("\tVersion:\t\t%s\n", param); clGetDeviceInfo(devices, CL_DEVICE_PROFILE, sizeof(param), param, &ret_size); printf("\tProfile:\t\t%s\n", param); clGetDeviceInfo(devices, CL_DRIVER_VERSION, sizeof(param), param, &ret_size); printf("\tDriver: \t\t%s\n", param); clGetDeviceInfo(devices, CL_DEVICE_EXTENSIONS, sizeof(param), param, &ret_size); printf("\tExtensions:\t\t%s\n", param); clGetDeviceInfo(devices, CL_DEVICE_MAX_WORK_ITEM_SIZES, 3 * sizeof(size_t), arr_tsize, &ret_size); printf("\tMax Work-Item Sizes:\t(%d,%d,%d)\n", arr_tsize[0], arr_tsize[1], arr_tsize[2]); clGetDeviceInfo(devices, CL_DEVICE_MAX_WORK_GROUP_SIZE, sizeof(size_t), &p_size, &ret_size); printf("\tMax Work Group Size:\t%d\n", p_size); clGetDeviceInfo(devices, CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(cl_uint), &entries, &ret_size); printf("\tMax Compute Units:\t%d\n", entries); clGetDeviceInfo(devices, CL_DEVICE_MAX_CLOCK_FREQUENCY, sizeof(cl_uint), &entries, &ret_size); printf("\tMax Frequency (Mhz):\t%d\n", entries); clGetDeviceInfo(devices, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, sizeof(cl_uint), &entries, &ret_size); printf("\tCache Line (bytes):\t%d\n", entries); clGetDeviceInfo(devices, CL_DEVICE_GLOBAL_MEM_SIZE, sizeof(cl_ulong), &long_entries, &ret_size); printf("\tGlobal Memory (MB):\t%llu\n", long_entries / 1024 / 1024); clGetDeviceInfo(devices, CL_DEVICE_LOCAL_MEM_SIZE, sizeof(cl_ulong), &long_entries, &ret_size); printf("\tLocal Memory (MB):\t%llu\n", long_entries / 1024 / 1024); clGetDeviceInfo(devices, CL_DEVICE_LOCAL_MEM_TYPE, sizeof(cl_device_local_mem_type), &mem_type, &ret_size); if (mem_type & CL_LOCAL) printf("\tLocal Memory Type:\tCL_LOCAL\n"); else if (mem_type & CL_GLOBAL) printf("\tLocal Memory Type:\tCL_GLOBAL\n"); else printf("\tLocal Memory Type:\tUNKNOWN\n"); clGetDeviceInfo(devices, CL_DEVICE_MAX_MEM_ALLOC_SIZE, sizeof(cl_ulong), &long_entries, &ret_size); printf("\tMax Mem Alloc (MB):\t%llu\n", long_entries / 1024 / 1024); clGetDeviceInfo(devices, CL_DEVICE_MAX_PARAMETER_SIZE, sizeof(size_t), &p_size, &ret_size); printf("\tMax Param Size (MB):\t%d\n", p_size); clGetDeviceInfo(devices, CL_DEVICE_MEM_BASE_ADDR_ALIGN, sizeof(cl_uint), &entries, &ret_size); printf("\tBase Mem Align (bits):\t%d\n", entries); clGetDeviceInfo(devices, CL_DEVICE_ADDRESS_BITS, sizeof(cl_uint), &entries, &ret_size); printf("\tAddress Space (bits):\t%d\n", entries); clGetDeviceInfo(devices, CL_DEVICE_IMAGE_SUPPORT, sizeof(cl_bool), &bool_entries, &ret_size); printf("\tImage Support:\t\t%d\n", bool_entries); clGetDeviceInfo(devices, CL_DEVICE_TYPE, sizeof(fp_conf), &fp_conf, &ret_size); printf("\tFloat Functionality:\t"); if (fp_conf & CL_FP_DENORM) printf("DENORM support "); if (fp_conf & CL_FP_ROUND_TO_NEAREST) printf("Round to nearest support "); if (fp_conf & CL_FP_ROUND_TO_ZERO) printf("Round to zero support "); if (fp_conf & CL_FP_ROUND_TO_INF) printf("Round to +ve/-ve infinity support "); if (fp_conf & CL_FP_FMA) printf("IEEE754 fused-multiply-add support "); if (fp_conf & CL_FP_INF_NAN) printf("INF and NaN support "); printf("\n"); clGetDeviceInfo(devices, CL_DEVICE_ERROR_CORRECTION_SUPPORT, sizeof(cl_bool), &bool_entries, &ret_size); printf("\tECC Support:\t\t%d\n", bool_entries); clGetDeviceInfo(devices, CL_DEVICE_EXECUTION_CAPABILITIES, sizeof(cl_device_exec_capabilities), &exec_cap, &ret_size); printf("\tExec Functionality:\t"); if (exec_cap & CL_EXEC_KERNEL) printf("CL_EXEC_KERNEL "); if (exec_cap & CL_EXEC_NATIVE_KERNEL) printf("CL_EXEC_NATIVE_KERNEL "); printf("\n"); clGetDeviceInfo(devices, CL_DEVICE_ENDIAN_LITTLE, sizeof(cl_bool), &bool_entries, &ret_size); printf("\tLittle Endian Device:\t%d\n", bool_entries); clGetDeviceInfo(devices, CL_DEVICE_PROFILING_TIMER_RESOLUTION, sizeof(size_t), &p_size, &ret_size); printf("\tProfiling Res (ns):\t%d\n", p_size); clGetDeviceInfo(devices, CL_DEVICE_AVAILABLE, sizeof(cl_bool), &bool_entries, &ret_size); printf("\tDevice Available:\t%d\n", bool_entries); } } int main(int argc, char * argv[]) { ////////////////////////////// // Init ////////////////////////////// IplImage* cvRaw = cvLoadImage("raw.bmp", 1); maskWidth = 3; maskHeight = 3; width = cvRaw->width; height = cvRaw->height; ////////////////////////////// // Serial (OpenCV) ////////////////////////////// // cvDoFindEdges(cvRaw); ////////////////////////////// // Parallel (OpenCL) ////////////////////////////// clPrintInfo(); clInitializeHost(cvRaw); // Initialize Host application clInitialize(); // Initialize OpenCL resources clRunKernels(); // Run the CL program IplImage* clSobel = clArryToCvImage(output,width,height); cvDisplay(clSobel,"clSobel",0,0); clCleanup(); // Releases OpenCL resources clCleanupHost(); // Release host resources return 0; }

alexaverbuch · ‎09-22-2009

omkaranathan,

the code contains a lot of details you don't care about... in a nut shell:

1) I use OpenCV to find edges in a 2D image, using Sobel operators

2) I WANT TO use OpenCL to make a parallel version of this to compare the speedup on a DualCore CPU and then on my low end ATI X1400 GPU when/if support is added

I've included all the code...

Thanks!

omkaranathan · ‎09-22-2009

As per OpenCL specification(Section 6.8(a)), arguments to __kernel functions in a program cannot be declared as a pointer to a pointer(s).

nou · ‎09-22-2009

have you set the PATH enviroment variable or LD_LIBRARY_PATH on Linux??

MicahVillmow · ‎09-22-2009

alex,
the X1400 GPU is not supported by ATI Stream SDK and will not be supported by OpenCL. That GPU does not have the required hardware to execute OpenCL code.

alexaverbuch · ‎09-23-2009

Originally posted by: MicahVillmow alex, the X1400 GPU is not supported by ATI Stream SDK and will not be supported by OpenCL. That GPU does not have the required hardware to execute OpenCL code.

Damn... but thanks for the info... must be time to buy an HD5870

Originally posted by: nouhave you set the PATH enviroment variable or LD_LIBRARY_PATH on Linux??

yes, the samples run fine

Originally posted by: MicahVillmowAs per OpenCL specification(Section 6.8(a)), arguments to __kernel functions in a program cannot be declared as a pointer to a pointer(s).

Thanks... I've now changed it to:

in Host - 2dim cl_uint array with, length = X*Y, height = sizeof(cl_uint4)

in Kernel - 1dim array cl_uint4 array with, length = X*Y

the formats are different between Host & Kernel because using cl_uint4 in the Host gives me all sorts of problems like Segmentation Faults, and I'm not sure why

But now I get the problem that these formats dont map perfectly to each other, so the data I read in the Kernel is incorrect (padded, or shifted... I havent looked into it enough yet)

Does anyone have a suggestion/work-around for my problem? Or at least a starting point?

n0thing · ‎09-23-2009

cl_uint4 data structure that you read inside your kernel needs be aligned to 16 byte boundary( you are using CL_MEM_USE_HOST_PTR I assume).

Section 6.1.5 from specification :

A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary.
A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions.The OpenCL compiler is responsible for aligning data items to the appropriate alignment as required by the data type.

But on the host you use something like :

cl_uint myUint4[4]; [ Taken from the other thread ]

but this is aligned to a 4 byte boundary only. So to solve this, use the default cl_uint4 available which is always aligned (see cl_platform.h)

In case you don't want to use that also here is another one :

typedef union myUint4 { cl_uint u32[4]; } myUint4;

now this will be aligned to 16 byte boundary.

Section 6.10.1 from spec:

alignment of any given struct or union type is required by the ISO C
standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question and must also be a power of two.

alexaverbuch · ‎09-23-2009

n0thing,

Thank you very much!

Although I haven't used your suggestions [yet ], it has comfirmed something I was thinking about...

I've now changed my Host to use this representation:

cl_uint *imageArray = (cl_uint*)memalign(16, width * height * sizeof(cl_uint4));

and it is working fine for me... I hope there isn't any "bad practise" embedded in this solution

Once I'm done with this project I'll make my complete source available... maybe it will be useful to someone else

Thanks to everyone for all the help so far... when stuff works, this OpenCL stuff is quite exciting

Alex

alexaverbuch · ‎09-23-2009

Ok, now using your solution n0thing... my code looks more elegant using that method

Archives Discussions

clBuildProgram - Invalid Operation