AnsweredAssumed Answered

AMD GPU OpenCL get wrong results while Nvidia correct

Question asked by huzhiyuan1994 on Jul 11, 2019
Latest reply on Jul 12, 2019 by kbala

Recently, I translated a CPU code into OpenCL, and it has been debugged and tested (using GTX1060).
The calculating process of this code is an iteration process. The calculating results are presented in the form of residual (the difference between the result of the last iteration step and that of the previous iteration step). The process of decreasing residual is called convergence process.
My computing environment VS2015, configed to using fp64, and Kernel functions are called in the same command queue in turn, and constraints by waiting cl_event.
That's ok on Nvidia Cards, but failed at AMD.
The phenomenon is:
1. Using CUDA 9.2 (OpenCL 1.2) to run on GTX1060, the results of each iteration step are almost the same as that of CPU results ( Although it is a slight deviation after decimal point 12).
2. For AMD cards, there are some difference between Debug and Release (only changing Debug and Release on VS)
(a) When Debug is used to run directly, the iteration will diverge under few steps (calculation result NaN). The results of the steps output are incorrect and show randomness (the results of are inconsistent each time);
B) Using Release to run, it can be calculated without divergence, but the value of each iteration step is quite different from that of CPU and GTX1060.
C) When I trying debug it, if the function step over (process by process) which contains calling more than one clEnqueueNDRangeKernel () is run, the result will be wrong, but when I entered this function and debug it step by step , the result will be correct.
D) Trying to change the AMD driver version, replacing the AMD graphics card (I have 2 pieces R9 390, and one R9 280x) or adding OpenCL compilation options (such as - cl-std = CL1.2 - cl-opt-disable) , it no effect.
E) Considering unordered execution is not enabled by itself, I suspected that the execution is not in process as the code, and I had set the callback function clSetEventCallback() to monitor the function trigger time but find that the order is correct.
To sum up, especially only the one-by-one monitoring can be correct, this is extremely unscientific.

Outcomes