cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jbrussell
Journeyman III

clEnqueueNDRangeKernel returns -45 on 2nd GPU

Works OK on primary HD 5870 but not on secondary FirePro v8750

Greetings,

I am running SDK 2.5 on a Windows 7 box with an HD 5870 (the primary adapter) and a FirePro V8750 (secondary).  clGetDeviceInfo() on the 5870 shows OpenCL 1.1 while the V8750 shows version 1.0 (in code box below).  The V8750 also returns "ATI RV770" as its name, which is a little unsettling but the V8750 has an RV770 as its core so I guess that's OK.  Also, in order to activate the secondary card, I did the 3 resistors in the DVI-VGA adaptor trick that I saw somewhere in this forum. 

My kernel builds and runs fine when I target the 5870.  When targeting the the V8750, the build is OK but clEnqueueNDRangeKernel() returns -45, which is  CL_INVALID_PROGRAM_EXECUTABLE.

I suspect this has something to do with the V8750 as being version 1.0 instead of 1.1.  Is there a way to force the build to be a 1.0 build?  According to the OpenCL spec the only valid version option for clBuildProgram() is "-cl-std=CL1.1", so there doesn't appear to be a way to force it to build a 1.0-compatible executable.  It seems to imply that it will build to whatever the device's CL_DEVICE_OPENCL_C_VERSION is but I don't really know if that's the case.

Is there a way to upgrade the V8750 to 1.1-compatible?  I've loaded the latest driver, am running the latest SDK, etc.

Thanks,

JR

Num available OpenCl devices = 16 Device 0 settings: Device type = GPU Vendor = Advanced Micro Devices, Inc. Name = Cypress Global memory size = 800 MB Local memory size = 32 KB Max workgroup size = 256 Max dimensions = 3 Max workitem sizes = 256,256,256 Version = OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) OpenCL C Version = OpenCL C 1.1 Device 1 settings: Device type = GPU Vendor = Advanced Micro Devices, Inc. Name = ATI RV770 Global memory size = 1024 MB Local memory size = 16 KB Max workgroup size = 256 Max dimensions = 3 Max workitem sizes = 256,256,256 Version = OpenCL 1.0 AMD-APP-SDK-v2.5 (684.213) OpenCL C Version = OpenCL C 1.0 Error: Enqueueing my_kernel onto command queue. Status = -45

0 Likes
15 Replies

jbrussell,
You cannot run binaries across different OpenCL versions. You will need to compile for both your 1.1 and 1.0 devices seperately. The R7XX series of chips will never support OpenCL 1.1 as they do not have the necessary hardware requirements.
0 Likes

Thanks, but is there a way with SDK 2.5 to force a compile to OpenCL 1.0?

0 Likes

Originally posted by: jbrussell Thanks, but is there a way with SDK 2.5 to force a compile to OpenCL 1.0?

 

Just remove -cl-std option from build options which allows compiler to select compiler version based on device.

0 Likes

The -cl-std option is not there.  I have the buildoptions parameter to clBuildProgram() set to NULL. 

0 Likes

Originally posted by: jbrussell The -cl-std option is not there.  I have the buildoptions parameter to clBuildProgram() set to NULL. 

Have you built kernel for all devices or first device?  Can you copy your runtime code and kernel code here which allows us to answer quickly?

0 Likes

In looking at the original code it looks like it builds just for device 0 since the num_devices parameter to clBuildProgram() is set to 1.  So now when I force it to only build the kernel for the device I intend to use, via a #define, it works fine for  devices[0], the 5870.  For devices[1], the V8750, clBuildProgram returns -11, CL_BUILD_PROGRAM_FAILURE.  When ported to an NVIDIA OpenCL 1.0 system this kernel builds and runs fine, so maybe there's a problem with the SDK setup?  The NVIDIA device happens to be device 0, so maybe the problem is for secondary devices?

I can't show the kernel code because it's proprietary but it's really only doing some fairly simple math on a ushort image in global memory...converts to float, does some math, converts back to ushort for output.

I've attached the portions of the runtime code that init and do the build...pretty standard, glommed from the template example.

So, to restate the problem(s):

When building only for the single device on which the kernel is to be run, clBuildProgram() returns -11  for the OpenCL 1.0 V8750 device (device 1), and returns CL_SUCCESS for the OPENCL 1.1 5870 device (device 0).

 

0 Likes

Looks like the code attachment failed.  2nd try:

 

cl_int initializeCL(void) { cl_int status = 0; size_t deviceListSize; /* * Have a look at the available platforms and pick either * the AMD one if available or a reasonable default. */ cl_uint numPlatforms; cl_platform_id platform = NULL; status = clGetPlatformIDs(0, NULL, &numPlatforms); if (status != CL_SUCCESS) { std::cout << "Error: Getting Platforms. (clGetPlatformsIDs)\n"; return 1; } if (numPlatforms > 0) { cl_platform_id* platforms = new cl_platform_id[numPlatforms]; status = clGetPlatformIDs(numPlatforms, platforms, NULL); if (status != CL_SUCCESS) { std::cout << "Error: Getting Platform Ids. (clGetPlatformsIDs)\n"; return 1; } // Print platform stats query_and_print_platform_info( platforms, numPlatforms ); for (cl_uint i=0; i < numPlatforms; ++i) { char pbuff[100]; status = clGetPlatformInfo( platforms, CL_PLATFORM_VENDOR, sizeof(pbuff), pbuff, NULL ); if (status != CL_SUCCESS) { std::cout << "Error: Getting Platform Info.(clGetPlatformInfo)\n"; return 1; } platform = platforms; if (!strcmp(pbuff, "Advanced Micro Devices, Inc.")) { break; } } delete platforms; } if (NULL == platform) { std::cout << "NULL platform found so Exiting Application." << std::endl; return 1; } /* * If we could find our platform, use it. Otherwise use just available platform. */ cl_context_properties cps[3] = { CL_CONTEXT_PLATFORM, (cl_context_properties)platform, 0 }; ///////////////////////////////////////////////////////////////// // Create an OpenCL context ///////////////////////////////////////////////////////////////// context = clCreateContextFromType( cps, CL_DEVICE_TYPE_GPU, NULL, NULL, &status ); if(status != CL_SUCCESS) { std::cout<<"Error: Creating Context. (clCreateContextFromType)\n"; return 1; } /* First, get the size of device list data */ status = clGetContextInfo( context, CL_CONTEXT_DEVICES, 0, NULL, &deviceListSize ); if (status != CL_SUCCESS) { std::cout << "Error: Getting Context Info (device list size, clGetContextInfo)\n"; return 1; } else { std::cout << "Num available OpenCl devices = " << deviceListSize << std::endl; } if (deviceListSize == 0) { std::cout << "Error: No devices found.\n"; return 1; } ///////////////////////////////////////////////////////////////// // Detect OpenCL devices ///////////////////////////////////////////////////////////////// devices = (cl_device_id *)malloc(deviceListSize); if (devices == 0) { std::cout << "Error allocating mem for devices.\n"; return 1; } /* Now, get the device list data */ status = clGetContextInfo( context, CL_CONTEXT_DEVICES, deviceListSize, devices, NULL ); if (status != CL_SUCCESS) { std::cout << "Error: Getting Context Info (device list, clGetContextInfo)\n"; return 1; } // Print stats about the OpenCL devices // if ( query_and_print_device_info( devices, deviceListSize ) != CL_SUCCESS ) if ( query_and_print_device_info( devices, 2 ) != CL_SUCCESS ) // craps out after 2 on PC { return 1; } ///////////////////////////////////////////////////////////////// // Create an OpenCL command queue ///////////////////////////////////////////////////////////////// commandQueue = clCreateCommandQueue( context, #ifdef RUN_ON_CYPRESS devices[0], #else #ifdef RUN_ON_FIREPRO devices[1], #else devices[0], #endif #endif CL_QUEUE_PROFILING_ENABLE, // enable profiling, in-order execution &status ); if (status != CL_SUCCESS) { std::cout << "Creating Command Queue. (clCreateCommandQueue)\n"; return 1; } ///////////////////////////////////////////////////////////////// // Create OpenCL device memory buffers ///////////////////////////////////////////////////////////////// // Create image buffers on device if ( create_image_device_buffers(context) != CL_SUCCESS ) { std::cout << "Error in create_image_device_buffers" << std::endl; return 1; } // Create func 0 buffers on device if ( create_func0_device_buffers(context) != CL_SUCCESS ) { std::cout << "Error in create_func0_device_buffers" << std::endl; return 1; } // Create func1 buffers on device if ( create_func1_device_buffers(context) != CL_SUCCESS ) { std::cout << "Error in create_func1_device_buffers" << std::endl; return 1; } ///////////////////////////////////////////////////////////////// // Load CL file, build CL program object, create CL kernel object ///////////////////////////////////////////////////////////////// const char * filename = "my_Kernels.cl"; std::string sourceStr = convertFileToString(filename); const char * source = sourceStr.c_str(); size_t sourceSize[] = { strlen(source) }; const char * buildOptions = NULL; program = clCreateProgramWithSource( context, 1, &source, sourceSize, &status ); if (status != CL_SUCCESS) { std::cout << "Error: Loading Binary into cl_program (clCreateProgramWithBinary)\n"; return 1; } /* create a cl program executable for all the devices specified */ #ifdef RUN_ON_CYPRESS status = clBuildProgram(program, 1, &devices[0], buildOptions, NULL, NULL); #else #ifdef RUN_ON_FIREPRO status = clBuildProgram(program, 1, &devices[1], buildOptions, NULL, NULL); #else // Original, builds across all devices status = clBuildProgram(program, 1, devices, NULL, NULL, NULL); #endif #endif if (status != CL_SUCCESS) { std::cout << "Error: Building Program (clBuildProgram) = " << status << std::endl; return 1; } /* get a kernel object handle for a kernel with the given name */ func0_kernel = clCreateKernel(program, "func0_kernel", &status); if (status != CL_SUCCESS) { std::cout << "Error: Creating func0_kernel from program. (clCreateKernel) = " << status << std::endl; return 1; } func1_kernel = clCreateKernel(program, "func1_kernel", &status); if (status != CL_SUCCESS) { std::cout << "Error: Creating func1_kernel from program. (clCreateKernel) = " << status << std::endl; return 1; } return CL_SUCCESS; } // The enqueue is pretty standard...here's the snippet size_t globThreads[] = {imageWidth, imageHeight}; size_t locThreads[] = {16, 16}; status = clEnqueueNDRangeKernel( commandQueue, func0_kernel, 2, // num dimensions NULL, // global work offset globThreads, // global work size NULL, //locThreads, // local work size 0, // num events in wait list NULL, // ptr to event wait list &events[4] // event ); if (status != CL_SUCCESS) { std::cout << "Error: Enqueueing func0_kernel onto command queue. Status = " << status << std::endl; return 1; }

0 Likes

Originally posted by: jbrussell In looking at the original code it looks like it builds just for device 0 since the num_devices parameter to clBuildProgram() is set to 1.  So now when I force it to only build the kernel for the device I intend to use, via a #define, it works fine for  devices[0], the 5870.  For devices[1], the V8750, clBuildProgram returns -11, CL_BUILD_PROGRAM_FAILURE.  When ported to an NVIDIA OpenCL 1.0 system this kernel builds and runs fine, so maybe there's a problem with the SDK setup?  The NVIDIA device happens to be device 0, so maybe the problem is for secondary devices?

 

I can't show the kernel code because it's proprietary but it's really only doing some fairly simple math on a ushort image in global memory...converts to float, does some math, converts back to ushort for output.

 

I've attached the portions of the runtime code that init and do the build...pretty standard, glommed from the template example.

 

So, to restate the problem(s):

 

When building only for the single device on which the kernel is to be run, clBuildProgram() returns -11  for the OpenCL 1.0 V8750 device (device 1), and returns CL_SUCCESS for the OPENCL 1.1 5870 device (device 0).

 



When you get CL_BUILD_PROGRAM_FAILURE,  use clGetProgramBuildInfo(CL_PROGRAM_BUILD_LOG) to get build log. which should solve your problem.

Paste program build log here which helps us to answer quickly.

It looks like you have written code which does not support on V8750 device.

0 Likes

OK, now we're getting somewhere.  clGetProgramBuildInfo() returned this: 

C:\Users\JBR\AppData\Local\Temp\OCL5183.tmp.cl(40): error: write to < 32
          bits via pointer not allowed unless cl_khr_byte_addressable_store is
          enabled
      outImg[r*dimX + c] = (ushort)temp;
      ^

So I added this pragma line to the kernel code:

#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable

and got this message:

C:\Users\JBR\AppData\Local\Temp\OCL5183.tmp.cl(7): error: can't enable all OpenCL extensions or unrecognized OpenCL extension
  #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
^

So it appears I can't enable the cl_khr_byte_addressable_store extension that it says I need.  That extension is mentioned in the 1.0 spec but not the 1.1 spec.  So it appears that the compiler either doesn't recognize the V8750 as a 1.0 device, or the compiler doesn't do 1.0 builds?

Another interesting error message is this:

4 errors detected in the compilation of "C:\Users\JBR\AppData\Local\Temp\OCL5183.tmp.cl".
Internal error: compiler frontend invocation failed. Make sure ATISTREAMSDKROOT is set

Looks like a leftover from the stream days.  I added a new environment variable ATISTREAMSDKROOT to point to the same location as AMDSDKROOT but still got the same error.

 

0 Likes

Originally posted by: jbrussell OK, now we're getting somewhere.  clGetProgramBuildInfo() returned this: 

C:\Users\JBR\AppData\Local\Temp\OCL5183.tmp.cl(40): error: write to < 32           bits via pointer not allowed unless cl_khr_byte_addressable_store is           enabled       outImg[r*dimX + c] = (ushort)temp;       ^

So I added this pragma line to the kernel code:

#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable

and got this message:

C:\Users\JBR\AppData\Local\Temp\OCL5183.tmp.cl(7): error: can't enable all OpenCL extensions or unrecognized OpenCL extension   #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable ^

So it appears I can't enable the cl_khr_byte_addressable_store extension that it says I need.  That extension is mentioned in the 1.0 spec but not the 1.1 spec.  So it appears that the compiler either doesn't recognize the V8750 as a 1.0 device, or the compiler doesn't do 1.0 builds?

Another interesting error message is this:

4 errors detected in the compilation of "C:\Users\JBR\AppData\Local\Temp\OCL5183.tmp.cl". Internal error: compiler frontend invocation failed. Make sure ATISTREAMSDKROOT is set

Looks like a leftover from the stream days.  I added a new environment variable ATISTREAMSDKROOT to point to the same location as AMDSDKROOT but still got the same error.





V8750 supports only OpenCL 1.0 and does not support cl_khr_byte_addressable_store.  Compiler is giving appropriate error.

Regarding ATISTREAMSDKROOT,  Please check whether you have more than one OpenCL libraries on your system.

AMDAPPSDKROOT is the right one. AMDSDKROOT is not correct one.

0 Likes

"V8750 supports only OpenCL 1.0 and does not support cl_khr_byte_addressable_store.  Compiler is giving appropriate error."

According to the online documentation cl_khr_byte_addressable_store is only an OpenCL 1.0 extension, i.e., not available for 1.1.  That tells me that the compiler is not recognizing the V8750 as a 1.0 device or, if it does see it as 1.0, the cl_khr_byte_addressable_store extension is not available.



 

 

 

 

 

 

 

 



 

0 Likes

Originally posted by: jbrussell "V8750 supports only OpenCL 1.0 and does not support cl_khr_byte_addressable_store.  Compiler is giving appropriate error."

 

According to the online documentation cl_khr_byte_addressable_store is only an OpenCL 1.0 extension, i.e., not available for 1.1.  That tells me that the compiler is not recognizing the V8750 as a 1.0 device or, if it does see it as 1.0, the cl_khr_byte_addressable_store extension is not available.

 





I should reframe my statement. V8750 supports only OpenCL 1.0 core spec and does not support few extension likes cl_khr_byte_addressable_store.

In OpenCL 1.1, cl_khr-byte_addressable_store became core spec.

0 Likes

"V8750 supports only OpenCL 1.0 core spec and does not support few extension likes cl_khr_byte_addressable_store."

Is that true of all RV770-based GPUs, specifically the HD 4850?  That's bad news if it is.

 

0 Likes

Originally posted by: jbrussell "V8750 supports only OpenCL 1.0 core spec and does not support few extension likes cl_khr_byte_addressable_store."

 

Is that true of all RV770-based GPUs, specifically the HD 4850?  That's bad news if it is.

 

RV7XX do not have sufficient hardware to support all features.

0 Likes

Originally posted by: jbrussell  

Another interesting error message is this:

4 errors detected in the compilation of "C:\Users\JBR\AppData\Local\Temp\OCL5183.tmp.cl". Internal error: compiler frontend invocation failed. Make sure ATISTREAMSDKROOT is set





Thank you for reporting this issue. I am able to reproduce this issue at my end.  Reported to developers and will be fixed in future releases.

0 Likes