Hello everyone,
I finally figured out why OpenCL programs fail to run on the cluster I use. So If you also get CL_PLATFORM_NOT_FOUND_KHR (-1001) and clGetPlatformIDs() fails, even though you have a a correct ICD registration, you might miss some library on which the client driver libatiocl32.so or libatiocl64.so depends.
The problem is, that the client driver is somehow loaded at runtime, and it fails in silence if a library is missing or the ICD registration (which is a textfile that specifies to load libatiocl[32|64].so) is wrong. Normally, you would get an error from the operating system if a dynamically linked library like libOpenCL is missing, but the client driver and it's loading mechanism introduces a kind of indirection.
In my case libGLU was missing on the cluster's nodes.
So If your ICD registration is correct and you encouter the type of failure above, you should check the output of (analog for 32 bit):
ldd $ATISTREAMSDKROOT/lib/x86_64/libatiocl64.so
which reported:
libdl.so.2 => /lib64/libdl.so.2 (0x00002ad7a2789000)
libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002ad7a298d000)
libGL.so.1 => /usr/lib64/libGL.so.1 (0x00002ad7a2c99000)
libGLU.so.1 => not found
librt.so.1 => /lib64/librt.so.1 (0x00002ad7a2f14000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002ad7a311d000)
libm.so.6 => /lib64/libm.so.6 (0x00002ad7a341e000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002ad7a36a1000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ad7a38af000)
libc.so.6 => /lib64/libc.so.6 (0x00002ad7a3aca000)
/lib64/ld-linux-x86-64.so.2 (0x0000003961e00000)
libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002ad7a3e1d000)
libXdmcp.so.6 => /usr/lib64/libXdmcp.so.6 (0x00002ad7a401f000)
libXext.so.6 => /usr/lib64/libXext.so.6 (0x00002ad7a4225000)
libXxf86vm.so.1 => /usr/lib64/libXxf86vm.so.1 (0x00002ad7a4436000)
libdrm.so.2 => /usr/lib64/libdrm.so.2 (0x00002ad7a463b000)
Every "not found" means a missing library, against which the driver is linked. After identifying the missing library you only need to find the corresponding package for your distribution and install it. For example:
yum install mesa-libGLU.x86_64
on CentOS, RedHat or Fedora.
I hope this post will help someone not to waste as many hours as I did by narrowing down the problem and fixing it.
It would also be a great idea for the AMD developers to add some helpfull error message to the whole ICD loading mechanism.
Originally posted by: nomac Hello everyone,
I finally figured out why OpenCL programs fail to run on the cluster I use. So If you also get CL_PLATFORM_NOT_FOUND_KHR (-1001) and clGetPlatformIDs() fails, even though you have a a correct ICD registration, you might miss some library on which the client driver libatiocl32.so or libatiocl64.so depends.
The problem is, that the client driver is somehow loaded at runtime, and it fails in silence if a library is missing or the ICD registration (which is a textfile that specifies to load libatiocl[32|64].so) is wrong. Normally, you would get an error from the operating system if a dynamically linked library like libOpenCL is missing, but the client driver and it's loading mechanism introduces a kind of indirection.
In my case libGLU was missing on the cluster's nodes.
So If your ICD registration is correct and you encouter the type of failure above, you should check the output of (analog for 32 bit):
ldd $ATISTREAMSDKROOT/lib/x86_64/libatiocl64.so
which reported:
libdl.so.2 => /lib64/libdl.so.2 (0x00002ad7a2789000) libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002ad7a298d000) libGL.so.1 => /usr/lib64/libGL.so.1 (0x00002ad7a2c99000) libGLU.so.1 => not found librt.so.1 => /lib64/librt.so.1 (0x00002ad7a2f14000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002ad7a311d000) libm.so.6 => /lib64/libm.so.6 (0x00002ad7a341e000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002ad7a36a1000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ad7a38af000) libc.so.6 => /lib64/libc.so.6 (0x00002ad7a3aca000) /lib64/ld-linux-x86-64.so.2 (0x0000003961e00000) libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002ad7a3e1d000) libXdmcp.so.6 => /usr/lib64/libXdmcp.so.6 (0x00002ad7a401f000) libXext.so.6 => /usr/lib64/libXext.so.6 (0x00002ad7a4225000) libXxf86vm.so.1 => /usr/lib64/libXxf86vm.so.1 (0x00002ad7a4436000)
libdrm.so.2 => /usr/lib64/libdrm.so.2 (0x00002ad7a463b000)
Every "not found" means a missing library, against which the driver is linked. After identifying the missing library you only need to find the corresponding package for your distribution and install it. For example:
yum install mesa-libGLU.x86_64
on CentOS, RedHat or Fedora.
I hope this post will help someone not to waste as many hours as I did by narrowing down the problem and fixing it.
It would also be great idea for the AMD developers to add some helpfull error message to the whole ICD loading mechanism.
nomac,
Thank you very much for sharing this with us. We will document this.