9 Replies Latest reply on Apr 25, 2012 10:01 AM by nbigaouette

    sem_wait() failed with APP 2.6

    gbilotta

      Hello,

       

      I recently upgraded a system of mine with a dual Xeon CPU from 2.4 and 2.6, and now all the OpenCL applications fail to start, aborting with

      ../../../thread/semaphore.cpp:87: sem_wait() failed

      the system has Linux 64-bit, kernel 2.6.32 running.

      Google thinks this error has already been discussed on this forum, but I cannot find the old link (it probably got lost in the migration to the new forum). Does anybody know what can I do to fix the issue?

        • Re: sem_wait() failed with APP 2.6
          mcdelorme

          Was it just a CPU swap that you did, or did you wind up reloading everything after the upgrade?  What distribution are you trying to run this on?  Are you compiling and running the programs on the same machine?

           

          --

          Mike

          • Re: sem_wait() failed with APP 2.6
            mcdelorme

            Google seems to have cached a copy of a discussion on this error, however I can't find it on the AMD forums either.  The topic title was APP 2.6 regression: creating queue fails with "sem_wait() failed".  It was posted on Jan. 9, 2012.  The last update that I can find reads as follows:

             

            I've got some further information on this: I think it is some interaction between the AMD OpenCL implementation and the NVIDIA OpenGL library. Even though I'm not using doing any GL interop, libamdocl64.so is linked against libGL.so.1, and on my system that by default picks up NVIDIA's libGL.so.1. If I use LD_LIBRARY_PATH to point things at a random build of Mesa's libGL.so.1 I happened to have lying around, all is well.

             

            I did notice that APP 2.5 had occasional random hangs and segfaults on shutdown inside libnvidia-tls. I guess 2.6 has turned occasional conflicts into a totally reproducable conflict :-/

             

            Your machine wouldn't happen to have had the NVIDIA drivers installed on it at one point, would it?

             

            --

            Mike

            1 of 1 people found this helpful
              • Re: sem_wait() failed with APP 2.6
                gbilotta

                Thanks, this was exactly the problem. The machine also has NVIDIA GPUs, and the OpenGL library that was found at load time was the one from NVIDIA. Switching to the mesa GL library fixed the problem.

                • Re: sem_wait() failed with APP 2.6
                  cb_750_rider

                  I have the same problem with all of the AMD examples that I have tried, but this fix did not work for me. I built the examples with the attached make file. I do have an NVIDIA card and drivers installed. I am running Ubuntu 11.10.

                   

                  I tried setting the LD_LIBRARY_PATH so that it would exclude any libraries from the Nvidia set, but that did not help. I then tried running the program in gdb to see where the failure was, and everything worked. I have tried a few other examples and they all work in gdb, but fail from the command line.

                   

                  I have since gone back and tried both with the NVIDIA libraries linked, and there does not seem to be a change.

                   

                  In all example programs, the program cannot find a GPU and chooses to run on the CPU, an AMD FX-8120. Is there possibly an issue with how the threads are handled on the FX that gdb is fixing? If so, are there any thoughts on how to fix this problem?

                    • Re: sem_wait() failed with APP 2.6
                      cb_750_rider

                      It looks like the crash happens when the code evaluates:

                       

                      cl::CommandQueue queue(context, devices[0], 0, &err);

                      • Re: sem_wait() failed with APP 2.6
                        gbilotta

                        I found that the only way to make absolutely sure the NVIDIA libGL was not being found was to physically remove it from the file system (rename it or redirect the symlink). Luckily enough, on Debian unstable both the amd and nvidia opencl drivers are quite current, and they set up an alternative system that allows you switch OpenGL implementation on the fly (glx-alternatives). Not sure if/when ubuntu will get something similar.

                          • Re: sem_wait() failed with APP 2.6
                            Bdot

                            I saw the same issue on SuSE 12.1 with some nvidia GPU and the nvidia driver, and unsetting LD_LIBRARY_PATH did not help. I have no other libGL on my system besides the nvidia one.

                             

                            Running my application under strace or gdb would not show this semaphore problem.

                             

                            Anyway, (re-)installing Mesa (seems like the nv driver installer deleted the Mesa libGL) and making sure this lib is used, made my app run again:

                             

                            zypper in -f Mesa

                            cd /usr/lib64

                            ln -sf libGL.so.1.2 libGL.so.1

                              • Re: sem_wait() failed with APP 2.6
                                martinn

                                I just upgraded my Linux machine[1] from Ubuntu 10.04 to 11.10, a complete reinstall, and in the process updated both the AMD APP SDK[2] and the NVIDIA display driver[3]. I have ever since experienced the "sem_wait() failed" error discussed in this thread. I have tried to apply the Mesa libGL trick mentioned here, but apparently my Linux skills aren't extensive enough and I'm afraid that I've made things worse in my attempts. I have the habit of documenting what I do whenever I don't fully understand what's supposed to happen and I'm now asking if anyone could have a look at the attached text file and help me figure out what I did wrong and how to fix it.

                                 

                                Thank you.

                                 

                                [1] CPU: AMD Phenom II X6 1055T

                                     GPU: NVIDIA GeForce GTX 460

                                [2] AMD-APP-SDK-v2.6-lnx64.tgz

                                [3] NVIDIA-Linux-x86_64-295.20.run

                                  • Re: sem_wait() failed with APP 2.6
                                    nbigaouette

                                    I might have found a semi-permanent solution. I'm testing it right now but I think it's the way to go.

                                     

                                    The main issue here is that AMD's OpenCL drivers named libamdocl32.so and libamdocl64.so depends on libGL.so, the OpenGL library. On a system where Nvidia drivers are installed, libGL.so is provided by nvidia and this causes a conflict as libamdocl{32,64}.so will fail with the sem_wait() error when they try to use nvidia's libGL.so.

                                     

                                    The solution is to force libamdocl{32,64}.so to use Mesa's libGL.so. On my gentoo system, this library is located under /usr/lib{32,64}/opengl/xorg-x11/lib/ and is provided by the package media-libs/mesa-7.11.2. On Ubuntu, the package should be named libgl1-mesa-dev (I'm not sure though as I don't use Ubuntu). Mesa should already be installed on all systems, even those with an nvidia card (the nvidia installer might move around the libGL.so file to prevent conflicts though). On a gentoo system, there isn't any conflicts as both libraries can be installed side by side and switched dynamically (using 'eselect opengl').

                                     

                                    A temporary solution is to run your code using LD_LIBRARY_PATH pointing to Mesa's ligGL.so's path:

                                    $ LD_LIBRARY_PATH=/usr/lib64/opengl/xorg-x11/lib ./my_opencl_code

                                     

                                    If on your distribution you cannot have _both_ mesa and nvidia (for example on ArchLinux), I would suggest you download the package file and extract it somewhere (for example, /usr/local/). Then use the path to Mesa's extracted libGL.so in the following procedure.

                                     

                                    So how to force AMD's libamdocl{32,64}.so to use mesa's libGL.so? Use PatchELF: http://nixos.org/patchelf.html

                                    On gentoo, the package is named dev-util/patchelf. For ArchLinux, it's available on AUR: https://aur.archlinux.org/packages.php?ID=39090. PatchELF seems to be packaged for many other distributions directly on their website: http://hydra.nixos.org/release/patchelf/patchelf-0.6

                                     

                                    Make a backup copy (just in case something goes wrong) and patch AMD's libamdocl{32,64}.so (make sure to update "locate" 's database using "sudo updatedb"):

                                    $ locate libamdocl64.so

                                    /usr/lib64/libamdocl64.so

                                    $ sudo cp /usr/lib64/libamdocl64.so /usr/lib64/libamdocl64.so.bak

                                    $ locate libamdocl32.so

                                    /usr/lib32/libamdocl32.so

                                    $ sudo cp /usr/lib32/libamdocl32.so /usr/lib32/libamdocl32.so.bak

                                     

                                    Make sure to use the right locate of Mesa's ligGL.so:

                                    $ locate libGL.so

                                    /usr/lib32/libGL.so

                                    /usr/lib32/libGL.so.1

                                    /usr/lib32/opengl/nvidia/lib/libGL.so

                                    /usr/lib32/opengl/nvidia/lib/libGL.so.1

                                    /usr/lib32/opengl/nvidia/lib/libGL.so.295.20

                                    /usr/lib32/opengl/xorg-x11/lib/libGL.so

                                    /usr/lib32/opengl/xorg-x11/lib/libGL.so.1

                                    /usr/lib32/opengl/xorg-x11/lib/libGL.so.1.2

                                    /usr/lib64/libGL.so

                                    /usr/lib64/libGL.so.1

                                    /usr/lib64/opengl/nvidia/lib/libGL.so

                                    /usr/lib64/opengl/nvidia/lib/libGL.so.1

                                    /usr/lib64/opengl/nvidia/lib/libGL.so.295.20

                                    /usr/lib64/opengl/xorg-x11/lib/libGL.so

                                    /usr/lib64/opengl/xorg-x11/lib/libGL.so.1

                                    /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2

                                     

                                    Doing an ls on these files shows:

                                    /usr/lib32/libGL.so -> opengl/nvidia/lib/libGL.so.295.20

                                    /usr/lib32/libGL.so.1 -> opengl/nvidia/lib/libGL.so.295.20

                                    /usr/lib32/opengl/nvidia/lib/libGL.so -> libGL.so.295.20

                                    /usr/lib32/opengl/nvidia/lib/libGL.so.1 -> libGL.so.295.20

                                    /usr/lib32/opengl/nvidia/lib/libGL.so.295.20

                                    /usr/lib32/opengl/xorg-x11/lib/libGL.so -> libGL.so.1

                                    /usr/lib32/opengl/xorg-x11/lib/libGL.so.1 -> libGL.so.1.2

                                    /usr/lib32/opengl/xorg-x11/lib/libGL.so.1.2

                                    /usr/lib64/libGL.so -> opengl/nvidia/lib/libGL.so.295.20

                                    /usr/lib64/libGL.so.1 -> opengl/nvidia/lib/libGL.so.295.20

                                    /usr/lib64/opengl/nvidia/lib/libGL.so -> libGL.so.295.20

                                    /usr/lib64/opengl/nvidia/lib/libGL.so.1 -> libGL.so.295.20

                                    /usr/lib64/opengl/nvidia/lib/libGL.so.295.20

                                    /usr/lib64/opengl/xorg-x11/lib/libGL.so -> libGL.so.1

                                    /usr/lib64/opengl/xorg-x11/lib/libGL.so.1 -> libGL.so.1.2

                                    /usr/lib64/opengl/xorg-x11/lib/libGL.so.1.2

                                     

                                    so as you see, the bold files are the one from Mesa. Note that this is on Gentoo Linux. It's probably different for other systems! Find the right one.

                                     

                                    When you have found the right Mesa's libGL.so (or extracted it somewhere), patch AMD's libraries:

                                    $ sudo patchelf --set-rpath /usr/lib64/opengl/xorg-x11/lib /usr/lib32/libamdocl64.so

                                    $ sudo patchelf --set-rpath /usr/lib32/opengl/xorg-x11/lib /usr/lib32/libamdocl32.so

                                    Note that the bold paths represent the path to Mesa's libGL.so.

                                     

                                    Now, you should be able to run directly your code using AMD's platform without any error:

                                    $ ./my_opencl_code

                                     

                                    I hope that helped!