21 Replies Latest reply on Mar 13, 2014 6:32 AM by ravkum

    OpenCL miscompiles and I have a clean reproducible case

    msoos

      As explained in Bug 994 – OpenCL kernel miscompiles, a minimal test case is attached the v9.5 (and many previous OpenCL compilers) miscompile the kernel in msoos/amdmiscompile · GitHub It's quite easy to check that the code it correct, and the code doesn't get miscompiled if the OpenCL compiler is asked not to optimize, or if it is asked to compile for and run on the CPU. Please read the README.md in the github repo, or read it online on the github webpage.

       

      I believe this is quite a serious bug and given that I have worked a lot to provide an extremely trivial test-case, it should be easy to find the bug and fix it in the OpenCL compiler. This bug is hit by one of my kernels, preventing me from using the many AMD cards I have to accelerate a computation. This bug is probably also triggered by other kernels, leading to wrong computations on AMD cards. This may mean wrong results for physics simulations, wrong reconstruction of X-Ray images, etc. In other words, depending on the use-case, it can have serious consequences. I would be very grateful if the bug was fixed as soon as possible.

       

      Thanks in advance!

        • Re: OpenCL miscompiles and I have a clean reproducible case
          ivan

          Hi there,

           

          Have you figured out what's the problem? It seems that I have a similar bug posted here. Still have no ideas what to do, how to fix, and where to post it.

          • Re: OpenCL miscompiles and I have a clean reproducible case
            ekondis

            I tried the code you've uploaded to github and I have to note that it didn't behave always on the same way. Trying it on the CPU, sometimes it run ok and sometimes it did not. Even more important is that I tried it on an NVidia GPU and it also did not produce correct results! Here is the output:

             

            Options you gave: 
            - Using GPU for computation
            - Optimizing compilation
            Num platforms: 1
             Platform name: NVIDIA CUDA
             Platform version: OpenCL 1.1 CUDA 4.2.1
            Num GPU device(s) recognized: 1
            Item size: 64
            Created command queue
            Building program..
            [opencl] device number: 0
            build status: specified program object for device was successful.
            ---- Build log ------ 
            
            
            
            
            ------ Build log end ---------
            Kernel-specific max workgroup size: 1024
            Local memory used by kernel: 0
            [opencl] Max compute units on device: 11
            Workgroup size: 64
            Set up graph mem 
            start value: 0
            Enqueuing kernel ...Done. 
            Read back 2048 chains
            Num times on this end : 1
            Num times on other end: 1
            Going through 2048 elements to test...
            Following data is WRONG!!!
            Keystart for this: 0
            Data here : 0x00000000
            Data there: 0x599d0010
            Following data is WRONG!!!
            Keystart for this: 1
            Data here : 0x00040000
            Data there: 0x59830010
            Following data is WRONG!!!
            Keystart for this: 2
            Data here : 0x00020000
            Data there: 0x59970010
            ...
            

             

            This fact makes me wonder if it is the compiler's fault.

              • Re: OpenCL miscompiles and I have a clean reproducible case
                msoos

                Ooops, you are right, I issued clFlush instead of clFinish. It should now work fine in --cpu mode for NVidia. It still miscompiles of course (since I adapter the slimmed version from one where I waited for the event before reading out results). I wouldn't be surprised if this still gives wrong results for NVidia's OpenCL as both NVidia and AMD use llvm for their compiler, which could have an optimization pass bug that is triggered in both cases. However, it would be *awesome* if you could re-check!

                 

                Thanks for pointing out this! Please get back so we can see and confirm!

                  • Re: OpenCL miscompiles and I have a clean reproducible case
                    ekondis

                    I tried with the updated fuzzer.cpp (with clFinnish) and I have errors on the NVidia for both optimized and unoptimized executions. On the AMD platform it is still unstable. The CPU version sometimes ends up correctly and others gives errors.

                      • Re: OpenCL miscompiles and I have a clean reproducible case
                        msoos

                        I have minimized the system even further. Can you please try to pull, make the system again, and test? Sorry to bother you, but I think if we can get this minimal, we could convince AMD to fix it for the both of us.

                         

                        Please attach the displayed version number like 'Platform version: OpenCL 1.2 AMD-APP (1348.4)' and please attach the md5sum of your amd compiler: 'md5sum /usr/lib/libamdocl64.so'. Please make sure that libamdocl64.so does not exist in any other place in the path, and it does not exist in /opt/amd.. --> the driver used to check that location first and it's the very old (+2yr) location. You can use 'strace' to be sure which one is loaded -- even 2yr old compilers can load and work. Mine is the v9.5 and it has md5sum of ece6d31454249c29e7b3b76c02462f54.

                          • Re: Re: OpenCL miscompiles and I have a clean reproducible case
                            ekondis

                            For the NVidia the situation is the same. It gives errors whether using the optimized one or not.

                             

                            For the AMD I tried it on 64bit and it seems to be as you describe. The CPU works correctly and the GPU works only for non optimized execution. Here is a typical part of the output:

                             

                            Options you gave: 
                            - Using GPU for computation
                            - Optimizing compilation
                            Num platforms: 1
                             Platform name: AMD Accelerated Parallel Processing
                             Platform version: OpenCL 1.2 AMD-APP (1214.3)
                            Num GPU device(s) recognized: 1
                            Item size: 64
                            Created command queue
                            Building program..
                            [opencl] device number: 0
                            build status: specified program object for device was successful.
                            ---- Build log ------ 
                            
                            
                            ------ Build log end ---------
                            Kernel-specific max workgroup size: 256
                            Local memory used by kernel: 0
                            [opencl] Max compute units on device: 2
                            Workgroup size: 64
                            Set up graph mem 
                            start value: 0
                            Enqueuing kernel ...Done. 
                            Read back 2048 chains
                            Num times on this end : 1
                            Num times on other end: 1
                            Going through 2048 elements to test...
                            Following data is WRONG!!!
                            Keystart for this: 10
                            Data here : 0x80aaaaaa
                            Data there: 0x80aaaa2a
                            

                             

                            I have two versions of libamdocl64.so (I dont know why, it's an almost fresh installation):

                            Here are the md5sums:

                            e69955d7c54dde6ef24ac7623593cc81  /opt/AMDAPP/lib/x86_64/libamdocl64.so

                            bdcb8df0e3367890b8930e0ac1b63adf  /usr/lib/fglrx/libamdocl64.so


                            However, I'm still not convinced that it is the compiler's fault. I tried the workaround you note on the README and it still outputs errors.

                              • Re: Re: OpenCL miscompiles and I have a clean reproducible case
                                msoos

                                Hey,

                                 

                                First of all, thanks! Yes, it's meant to be compiled&used on a 64b machine, I forgot to say! It's cool that NVidia has the same bug. The OpenCL bug in AMD seems to be confirmed, a kind AMD engineer got back to me about with and educated guess at the exact bug in the compiler. I'm hoping this to be fixed for the next public version. I'll get back to you and to this thread about the results of the fix

                                 

                                Cheers again,

                                 

                                Mate

                                 

                                PS: It's really-really not a good idea to have 2 libamdocl64.so. One of them will be loaded, and you won't know which one. I personally would delete all of /opt/AMDAPP + all of /usr/lib/fglrx and reinstall the drivers. That way you'll be sure that next time something gets installed, it'll be at the right path, and it will be the one used. I had a friend who was using a 1 year old compiler and didn't know: it loaded the one from the old location (/opt/AMDAPP) by default and the new drivers put them at /usr/lib/.

                                  • Re: Re: OpenCL miscompiles and I have a clean reproducible case
                                    nou

                                    I never install SDK. just extract include/CL folder put it into /usr/include/CL and that is all.

                                    • Re: Re: OpenCL miscompiles and I have a clean reproducible case
                                      ekondis

                                      msoos wrote:

                                      PS: It's really-really not a good idea to have 2 libamdocl64.so. One of them will be loaded, and you won't know which one. I personally would delete all of /opt/AMDAPP + all of /usr/lib/fglrx and reinstall the drivers. That way you'll be sure that next time something gets installed, it'll be at the right path, and it will be the one used. I had a friend who was using a 1 year old compiler and didn't know: it loaded the one from the old location (/opt/AMDAPP) by default and the new drivers put them at /usr/lib/.

                                       

                                      Thanks for the advice. Today I experienced crashes whenever I was running a 32bit opencl application with the 14.1 beta driver and this was due to the presense of 2 libamdocl32.so files (/opt/AMDAPP/lib/x86/libamdocl32.so and /usr/lib32/fglrx/libamdocl32.so). I removed the one in /opt/AMDAPP/... and replaced it with a soft link to the other one. Now it works fine without crashes.

                          • Re: OpenCL miscompiles and I have a clean reproducible case
                            ravkum

                            Hi,

                             

                            This is to confirm that the Catalyst driver version 14.20 has this bug fixed.

                             

                            Regards,

                            Ravi