cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

timchist
Elite

CL_INVALID_PLATFORM in clGetPlatformInfo

I have ATI 5850 and NVIDIA GTX 470 installed in one computer. I use Windows 7 64 bit, but compile a 32-bit application.

Sometimes when I try to find an AMD Open CL platform, error -32 (CL_INVALID_PLATFORM) is returned from clGetPlatformInfo.

So clGetPlatformIDs says that 2 platform present and successfully returns the list of IDs of these platforms. Then I do a loop for the platforms. Platform 0 on my system is NVIDIA, so I get its name without errors. But then when I call clGetPlatformInfo for the second platform ID (which is supposed to be AMD), I get CL_INVALID_PLATFORM.

What makes it difficult to debug that the error is not always reproduced. In most cases the program just works normally, but some builds have this error.

cl_uint numPlatforms; err = clGetPlatformIDs(0, NULL, &numPlatforms); cl_platform_id* platforms = new cl_platform_id[numPlatforms]; err = clGetPlatformIDs(numPlatforms, platforms, &numPlatforms); if(err == CL_SUCCESS) { cl_platform_id amdPlatform = NULL; for(cl_uint i = 0; i < numPlatforms; i++) { char pbuf[100]; err = clGetPlatformInfo(platforms, CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); if(err != CL_SUCCESS) break; if(strcmp(pbuf, "Advanced Micro Devices, Inc.") == 0) { amdPlatform = platforms; break; } } if(!amdPlatform) { err = CL_DEVICE_NOT_FOUND; } else { cl_uint devCount; err = clGetDeviceIDs(amdPlatform, CL_DEVICE_TYPE_GPU, 0, NULL, &devCount); if(devCount == 0) { err = CL_DEVICE_NOT_FOUND; } } if(err == CL_DEVICE_NOT_FOUND && Log!=NULL) strcat(Log, "Failed to find ATI GPU Device." PLZ_INSTALL_ATI_DRIVER); } else { if(Log!=NULL) strcat(Log, "Failed to initialize OpenCL driver." PLZ_INSTALL_ATI_DRIVER); }

0 Likes
13 Replies
himanshu_gautam
Grandmaster

Please post clInfo output.

0 Likes

clInfo output is attached

Number of platforms: 2 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.0 CUDA 3.2.1 Platform Name: NVIDIA CUDA Platform Vendor: NVIDIA Corporation Platform Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.1 ATI-Stream-v2.2 (302) Platform Name: ATI Stream Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_khr_d3d10_sharing Platform Name: NVIDIA CUDA Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Device ID: 4318 Max compute units: 14 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 64 Max work group size: 1024 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Max clock frequency: 810Mhz Address bits: 32 Max memory allocation: 327270400 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 4096 Max image 2D height: 32768 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4352 Alignment (bits) of base address: 4096 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 128 Cache size: 229376 Global memory size: 1309081600 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 49152 Profiling timer resolution: 1000 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: Yes Profiling : Yes Platform ID: 00000000004950B0 Name: GeForce GTX 470 Vendor: NVIDIA Corporation Driver version: 260.99 Profile: FULL_PROFILE Version: OpenCL 1.0 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Error : atomics mismatch! Error : Bytes mismatch! Error : d3d10Sharing mismatch! Error : glSharing mismatch! Error : images mismatch! Error : printf mismatch! Error : deviceAttributeQuery mismatch! Failed! Platform Name: ATI Stream Number of devices: 2 Device Type: CL_DEVICE_TYPE_CPU Device ID: 4098 Max compute units: 8 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 3073Mhz Address bits: 64 Max memory allocation: 1073741824 Image support: No Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 3221225472 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Profiling timer resolution: 333 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0000000003A33568 Name: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz Vendor: GenuineIntel Driver version: 2.0 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.2 (302) Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf cl_khr_d3d10_sharing Device Type: CL_DEVICE_TYPE_GPU Device ID: 4098 Max compute units: 18 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 4 Preferred vector width double: 0 Max clock frequency: 765Mhz Address bits: 32 Max memory allocation: 134217728 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 32768 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 536870912 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 0000000003A33568 Name: Cypress Vendor: Advanced Micro Devices, Inc. Driver version: CAL 1.4.739 Profile: FULL_PROFILE Version: OpenCL 1.1 ATI-Stream-v2.2 (302) Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops cl_khr_d3d10_sharing Passed!

0 Likes

hi tim,

could you use the OpenCL demos from GPU Caps Viewer v1.9.4 in both cards? You can also try to check the oclDeviceQuery sample from nvidia, there's the code so you can look at with some detail on what it is done when carrying the info about platforms/devices.

about your build, can you please give me some info about it, like sdk and display drivers versions both from AMD and NVIDIA you're using? I'm trying put together a GTX460 and a HD5870 into my Win7 Pro 64bits build (Phenom II X4 965 + AMD 790FX/SB750 based mainboard), but clGetPlatformIDs() will just crash with an access violation error message. In which order have you installed the drivers/sdk? And what about PCI-E slot order, which of your cards takes the first slot?

0 Likes

could you use the OpenCL demos from GPU Caps Viewer v1.9.4 in both cards? You can also try to check the oclDeviceQuery sample from nvidia, there's the code so you can look at with some detail on what it is done when carrying the info about platforms/devices.


Will do and post the results.

 

about your build, can you please give me some info about it, like sdk and display drivers versions both from AMD and NVIDIA you're using? I'm trying put together a GTX460 and a HD5870 into my Win7 Pro 64bits build (Phenom II X4 965 + AMD 790FX/SB750 based mainboard), but clGetPlatformIDs() will just crash with an access violation error message. In which order have you installed the drivers/sdk? And what about PCI-E slot order, which of your cards takes the first slot?


ATI SDK 2.2, NVIDIA SDK 3.2 RC2. 8.753.1.0 driver for ATI, 260.99 for NVIDIA. Don't remember the order which I installed drivers and SDKs.

I have GTX 470 in PCI-E slot 1 and 5850 in slot 2 (cannot install them in a different way, since ATI card is too long and does not fit). In Windows I have to connect monitors to both cards, but tell Windows to use only one monitor (the one connected to the ATI GPU). Also, in order to avoid the crash (which I also experience, by the way), the following hepls: disable NVIDIA GPU in Device Manager (after you switched to one monitor) and then enable it again.

0 Likes

thx for the tips. i'll try doing as soon as possible and will reply for the unstable platform detection.

0 Likes

Hi timchist,

You reported that you face the problem only some of the times and not always? Do you face this issue in clinfo also?

If no please send a complete test case from which we directly reproduce the issue.

0 Likes

It seems that I have found what the problem is. My application is linked both with cuda.lib and opecl.lib (the app uses either CUDA -- for NVIDIA cards or OpenCL -- for ATI cards). clGetPlatformInfo for some reason presents in BOTH libraries. It seems that sometimes linker puts the wrong function to the executable (the one from cuda.lib, not from opencl.lib). This function in turn calls another function directly from nvcuda.dll (not from opencl.dll), and the latter complains about the AMD platform.

I have no idea why clGetPlatformInfo is in cuda.lib. Trying to find a way to make MS linker choose the function from the right library. I'm surprised that it didn't complain about the fact that the function presents in both libraries.

0 Likes

If you try to compile the program attached by using the following command line:

cl cl_sample.cpp /I "%OpenCLIncludePath%" /link "%OpenCLLibPath32%\OpenCL.lib" "%CUDA_LIB_PATH%\..\Win32\cuda.lib" "Delayimp.lib" /DELAYLOAD:"nvcuda.dll" /DELAYLOAD:"OpenCL.dll"

(you will need to declare %OpenCLLibPath32% and %CUDA_LIB_PATH% environment variables appropriately).

then it works. But if you change the order of libraries in the command line

cl cl_sample.cpp /I "%OpenCLIncludePath%" /link "%CUDA_LIB_PATH%\..\Win32\cuda.lib" "%OpenCLLibPath32%\OpenCL.lib" "Delayimp.lib" /DELAYLOAD:"nvcuda.dll" /DELAYLOAD:"OpenCL.dll"

then error -32 is returned from clGetPlatformInfo.

#include <CL/cl.h> #include <stdio.h> #include <string.h> #include <windows.h> #define CHECK_ERR if(err != CL_SUCCESS) { printf("Error %d at %d\n", err, __LINE__); FreeLibrary(oclLib); return 1; } //------------------------------------------------------------------------------ int main(int argc, char* argv[]) { HMODULE oclLib = LoadLibrary("c:\\windows\\syswow64\\OpenCL.dll"); if(oclLib == NULL) { printf("OpenCL.dll is not present\n"); return 1; } cl_uint numPlatforms; cl_platform_id platform = NULL; cl_int err = clGetPlatformIDs(0, NULL, &numPlatforms); CHECK_ERR(err); if(numPlatforms == 0) { printf("No platforms found\n"); FreeLibrary(oclLib); return 1; } printf("Num platforms: %d\n", (int)numPlatforms); cl_platform_id* platforms = new cl_platform_id[numPlatforms]; err = clGetPlatformIDs(numPlatforms, platforms, &numPlatforms); if(err != CL_SUCCESS) { delete platforms; CHECK_ERR(err); } for(cl_uint i = 0; i < numPlatforms; i++) { char pbuf[100]; err = clGetPlatformInfo(platforms, CL_PLATFORM_VENDOR, sizeof(pbuf), pbuf, NULL); if(err != CL_SUCCESS) { delete platforms; CHECK_ERR(err); } printf("%i: %s\n", (int)i, pbuf); if(strcmp(pbuf, "Advanced Micro Devices, Inc.") == 0) { platform = platforms; break; } } delete platforms; if(platform == NULL) { printf("AMD platform is not found\n"); FreeLibrary(oclLib); return 1; } FreeLibrary(oclLib); printf("All OK\n"); return 0; }

0 Likes

hi timchist,

i could finally try to mount both cards in my rig, and it went just fine! Thx to your tip:



tell Windows to use only one monitor (the one connected to the ATI GPU). Also, in order to avoid the crash (which I also experience, by the way), the following hepls: disable NVIDIA GPU in Device Manager (after you switched to one monitor) and then enable it again




everything is perfect! Afterwards I could hook both monitors to the AMD card and after reboot everything runs just fine! No crashes whatsoever, no random invalid platform detection.

It seems that order of drivers/sdk and even the PCI-E slot does not matter. I had the GTX460 in the first slot (with drivers+sdk installed), which i switched to the second slot, then put AMD in the first slot, installed drivers+sdk (catalyst 10.10).

Using clGetPlatformIDs, the ndivia platform is given as the first one and amd the second one. It's maybe because i had the nvidia installed first (or not?). I can use both nvidia and amd SDKs with no problem, no performance penalty. I've tested your example it runs all fine. I'm using MS Visual Studio to compile code, and on the cmd line I don't have any of the cuda/lib or nvcuda.dll, just have this:

compiling: /Od /I "C:\Program Files (x86)\ATI Stream\\include" /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /RTC1 /MDd /Fo"Debug\\" /Fd"Debug\vc90.pdb" /W3 /nologo /c /ZI /TP /errorReportrompt

linking: /OUT:"D:\test_CL\test_CL\Debug\test_CL.exe" /INCREMENTAL /NOLOGO /LIBPATH:"C:\Program Files (x86)\ATI Stream\\lib\x86" /MANIFEST /MANIFESTFILE:"Debug\test_CL.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"d:\teste_CL\test_CL\Debug\test_CL.pdb" /SUBSYSTEM:CONSOLE /DYNAMICBASE /NXCOMPAT /MACHINE:X86 /ERRORREPORTROMPT opencl.lib  kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib

However, I do have a problem when it comes to use OpenGL/OpenCL iterop. With AMD card it runs just fine, however, when I chose the nvidia platform, I have an weird error when trying to create the context with


cl_context_properties cpsGL[] = { CL_CONTEXT_PLATFORM, (cl_context_properties)platform,
                  CL_WGL_HDC_KHR, glCurrentDC,
                  CL_GL_CONTEXT_KHR, glCtx,
                  0};

cprops = (NULL == platform) ? NULL : cpsGL;

if (cprops == NULL)
    return 1;

context = clCreateContextFromType(cprops, CL_DEVICE_TYPE_GPU, NULL, NULL, &status);


context is NULL and status returns -1000 error code! It's probably because the OpenGL context is created for the amd card (which is the one hooked to the monitors) and it can't be associated to the nvidia one. Any suggestions on how to get it working? (i'm usinf GLUT to create opengl context)

EDIT: the GL/Cl iterop it will also work when creating context with CL_DEVICE_TYPE_CPU (i've a Phenom II X4 965) using the amd platform

0 Likes

laobrasuca,

Thank you very much for sharing your experience.

AFAIK, glcontext is created always for the default device and glinterop will not function properly for device other that default.

 

0 Likes

Originally posted by: laobrasuca I'm using MS Visual Studio to compile code, and on the cmd line I don't have any of the cuda/lib or nvcuda.dll


That's because your application does not utilize CUDA (only through Open CL). My application uses either OpenCL API (for ATI cards) or CUDA API directly (for NVIDIA cards). Using OpenCL for NVIDIA cards using the NVIDIA OpenCL platform gives worse results than using CUDA directly.

That is why I needed both libraries.

0 Likes

np Himanshu, it's the least i can do

For GL/CL iterop, i heard of share lists of GL on different devices. I'll try find a solution on this direction. But if anyone have any info that could help, please share

tim, how much faster it is CUDA vs OpenCL (for your app)?

leo

0 Likes

Originally posted by: laobrasuca

 

tim, how much faster it is CUDA vs OpenCL (for your app)?

 

It is difficult to tell at this point as significant refforts are required to make the application use CUDA OpenCL. But our initial tests of single steps' performance showed about 10%-20% difference in average. In some specific tests OpenCL could be twice as slow. However, the difference decreases as your task size increases, and at some point OpenCL begins to outperform CUDA, but such sizes are not used in real application work.

0 Likes