cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

msoos
Adept I

OpenCL miscompiles and I have a clean reproducible case

As explained in Bug 994 – OpenCL kernel miscompiles, a minimal test case is attached the v9.5 (and many previous OpenCL compilers) miscompile the kernel in msoos/amdmiscompile · GitHub It's quite easy to check that the code it correct, and the code doesn't get miscompiled if the OpenCL compiler is asked not to optimize, or if it is asked to compile for and run on the CPU. Please read the README.md in the github repo, or read it online on the github webpage.

I believe this is quite a serious bug and given that I have worked a lot to provide an extremely trivial test-case, it should be easy to find the bug and fix it in the OpenCL compiler. This bug is hit by one of my kernels, preventing me from using the many AMD cards I have to accelerate a computation. This bug is probably also triggered by other kernels, leading to wrong computations on AMD cards. This may mean wrong results for physics simulations, wrong reconstruction of X-Ray images, etc. In other words, depending on the use-case, it can have serious consequences. I would be very grateful if the bug was fixed as soon as possible.

Thanks in advance!

0 Likes
20 Replies
ivan
Adept I

Hi there,

Have you figured out what's the problem? It seems that I have a similar bug posted here. Still have no ideas what to do, how to fix, and where to post it.

We are in the same boat. About the code you posted... DES? I wrote a DES OpenCL kernel some time ago and guess what? It miscompiles! Also, it's about 3x slower if you use their new compiler, but I digress. Anyway, they ought to fix this. Their bitwise operations are often miscompiled. Also, their register allocation code is bad for the new compilers -- register spills for code that didn't use to register-spill. This whole show would be hilarious if it wasn't so sad.

0 Likes

Yeah, exactly. And the bad thing about it is that we're knocking at the closed door. There's no bug submission form and no reply from developers. I do hope they fix it in next release.

0 Likes

First of all, there is a bugzilla, it's just rarely used by AMD personnel (see my original post, there is even a bug number). As for the miscompile: it has been broken for over a year now. I submitted them a less clean bug report about half a year ago, nothing. I am hoping them to look a the bug now that it's so clear.

0 Likes

Hi Msoos,

Thanks. I will look into the sample and will get back to you on this.

In the mean-time, if you are looking for a work-around and haven't already found one, have a look at this thread: Re: Problem with bit operations with openCL.

Regards,

Ravi

0 Likes

Hi Mate,

An update for you. The issue has been fixed and fix should be available in the next driver. We will get back with exact details later.

Thanks again for your efforts on writing the reproducing code. We appreciate it.

Regards,

Ravi

0 Likes

Great response!

This is how AMD should address driver and SDK issues.

Thank you.

0 Likes

Yes. Though, I have to admit, I did have to wait half a year for it (yes, AMD had an internal bug number assigned, as I submitted a non-public, working, reproducible, bug-triggering code). So, next time, maybe, just maybe, AMD could do it in less than half a year. I mean, let's assume I have 3 more bugs (I do). Let's assume this is the way things are supposed to be. Then, in 'only' 1.5 years, all my bugs will be fixed.

To be honest, I'm a bit more hopeful, things *have* been improving lately. Maybe new management. I'm cautiously optimistic

0 Likes

Hi Mate,

Please report the other 3 bugs. I will have a look at it.

Regards,

Ravi

0 Likes


msoos wrote:



Yes. Though, I have to admit, I did have to wait half a year for it (yes, AMD had an internal bug number assigned, as I submitted a non-public, working, reproducible, bug-triggering code). So, next time, maybe, just maybe, AMD could do it in less than half a year. I mean, let's assume I have 3 more bugs (I do). Let's assume this is the way things are supposed to be. Then, in 'only' 1.5 years, all my bugs will be fixed.




I guess they do it in a pipelined fashion so it takes much less than 1.5 years. However, half year latency is too long.

0 Likes
ekondis
Adept II

I tried the code you've uploaded to github and I have to note that it didn't behave always on the same way. Trying it on the CPU, sometimes it run ok and sometimes it did not. Even more important is that I tried it on an NVidia GPU and it also did not produce correct results! Here is the output:


Options you gave:


- Using GPU for computation


- Optimizing compilation


Num platforms: 1


Platform name: NVIDIA CUDA


Platform version: OpenCL 1.1 CUDA 4.2.1


Num GPU device(s) recognized: 1


Item size: 64


Created command queue


Building program..


[opencl] device number: 0


build status: specified program object for device was successful.


---- Build log ------






------ Build log end ---------


Kernel-specific max workgroup size: 1024


Local memory used by kernel: 0


[opencl] Max compute units on device: 11


Workgroup size: 64


Set up graph mem


start value: 0


Enqueuing kernel ...Done.


Read back 2048 chains


Num times on this end : 1


Num times on other end: 1


Going through 2048 elements to test...


Following data is WRONG!!!


Keystart for this: 0


Data here : 0x00000000


Data there: 0x599d0010


Following data is WRONG!!!


Keystart for this: 1


Data here : 0x00040000


Data there: 0x59830010


Following data is WRONG!!!


Keystart for this: 2


Data here : 0x00020000


Data there: 0x59970010


...


This fact makes me wonder if it is the compiler's fault.

0 Likes

Ooops, you are right, I issued clFlush instead of clFinish. It should now work fine in --cpu mode for NVidia. It still miscompiles of course (since I adapter the slimmed version from one where I waited for the event before reading out results). I wouldn't be surprised if this still gives wrong results for NVidia's OpenCL as both NVidia and AMD use llvm for their compiler, which could have an optimization pass bug that is triggered in both cases. However, it would be *awesome* if you could re-check!

Thanks for pointing out this! Please get back so we can see and confirm!

0 Likes

I tried with the updated fuzzer.cpp (with clFinnish) and I have errors on the NVidia for both optimized and unoptimized executions. On the AMD platform it is still unstable. The CPU version sometimes ends up correctly and others gives errors.

0 Likes

I have minimized the system even further. Can you please try to pull, make the system again, and test? Sorry to bother you, but I think if we can get this minimal, we could convince AMD to fix it for the both of us.

Please attach the displayed version number like 'Platform version: OpenCL 1.2 AMD-APP (1348.4)' and please attach the md5sum of your amd compiler: 'md5sum /usr/lib/libamdocl64.so'. Please make sure that libamdocl64.so does not exist in any other place in the path, and it does not exist in /opt/amd.. --> the driver used to check that location first and it's the very old (+2yr) location. You can use 'strace' to be sure which one is loaded -- even 2yr old compilers can load and work. Mine is the v9.5 and it has md5sum of ece6d31454249c29e7b3b76c02462f54.

0 Likes

For the NVidia the situation is the same. It gives errors whether using the optimized one or not.

For the AMD I tried it on 64bit and it seems to be as you describe. The CPU works correctly and the GPU works only for non optimized execution. Here is a typical part of the output:


Options you gave:


- Using GPU for computation


- Optimizing compilation


Num platforms: 1


Platform name: AMD Accelerated Parallel Processing


Platform version: OpenCL 1.2 AMD-APP (1214.3)


Num GPU device(s) recognized: 1


Item size: 64


Created command queue


Building program..


[opencl] device number: 0


build status: specified program object for device was successful.


---- Build log ------




------ Build log end ---------


Kernel-specific max workgroup size: 256


Local memory used by kernel: 0


[opencl] Max compute units on device: 2


Workgroup size: 64


Set up graph mem


start value: 0


Enqueuing kernel ...Done.


Read back 2048 chains


Num times on this end : 1


Num times on other end: 1


Going through 2048 elements to test...


Following data is WRONG!!!


Keystart for this: 10


Data here : 0x80aaaaaa


Data there: 0x80aaaa2a


I have two versions of libamdocl64.so (I dont know why, it's an almost fresh installation):

Here are the md5sums:

e69955d7c54dde6ef24ac7623593cc81  /opt/AMDAPP/lib/x86_64/libamdocl64.so

bdcb8df0e3367890b8930e0ac1b63adf  /usr/lib/fglrx/libamdocl64.so


However, I'm still not convinced that it is the compiler's fault. I tried the workaround you note on the README and it still outputs errors.

0 Likes

Hey,

First of all, thanks! Yes, it's meant to be compiled&used on a 64b machine, I forgot to say! It's cool that NVidia has the same bug. The OpenCL bug in AMD seems to be confirmed, a kind AMD engineer got back to me about with and educated guess at the exact bug in the compiler. I'm hoping this to be fixed for the next public version. I'll get back to you and to this thread about the results of the fix

Cheers again,

Mate

PS: It's really-really not a good idea to have 2 libamdocl64.so. One of them will be loaded, and you won't know which one. I personally would delete all of /opt/AMDAPP + all of /usr/lib/fglrx and reinstall the drivers. That way you'll be sure that next time something gets installed, it'll be at the right path, and it will be the one used. I had a friend who was using a 1 year old compiler and didn't know: it loaded the one from the old location (/opt/AMDAPP) by default and the new drivers put them at /usr/lib/.

0 Likes

I never install SDK. just extract include/CL folder put it into /usr/include/CL and that is all.

0 Likes


msoos wrote:


PS: It's really-really not a good idea to have 2 libamdocl64.so. One of them will be loaded, and you won't know which one. I personally would delete all of /opt/AMDAPP + all of /usr/lib/fglrx and reinstall the drivers. That way you'll be sure that next time something gets installed, it'll be at the right path, and it will be the one used. I had a friend who was using a 1 year old compiler and didn't know: it loaded the one from the old location (/opt/AMDAPP) by default and the new drivers put them at /usr/lib/.


Thanks for the advice. Today I experienced crashes whenever I was running a 32bit opencl application with the 14.1 beta driver and this was due to the presense of 2 libamdocl32.so files (/opt/AMDAPP/lib/x86/libamdocl32.so and /usr/lib32/fglrx/libamdocl32.so). I removed the one in /opt/AMDAPP/... and replaced it with a soft link to the other one. Now it works fine without crashes.

0 Likes

Personally I don't install AMD APP SDK on Linux. it have quite lot of issues. I just manually extract include/CL to /usr/include and leave everything else to catalyst driver installation. Putting LD_LIBRARY_PATH to global profile is IMHO bad practice.

0 Likes
ravkum
Staff

Hi,

This is to confirm that the Catalyst driver version 14.20 has this bug fixed.

Regards,

Ravi

0 Likes