The details gathered by my program.
platformVendor=Advanced Micro Devices, Inc., err=0
platformName=AMD Accelerated Parallel Processing, err=0
platformVer=OpenCL 1.1 AMD-APP (898.1), err=0
platformExt=cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing, err=0
Windows 7 Ultimate 64 bit.
I just updated my drivers today. The issue happened in the last driver too.
I've been trying many different ways to get correct results on AMD GPU OpenCL.
My program works fine on:
- NVidia GPU
- Intel CPU (OpenCL)
- AMD CPU OpenGL
...but fails on AMD GPU OpenCL (HD5770). "Fails" means that it either produces all zeros as the result, or AVs (if the kernel uses fma instead of mad)
The kernel source code and host program source are attached. I've also attached a mini-dump of the process at the AV.
If you replace the fma calls with mad calls, it doesn't crash the compiler, but all the results are zeros in that case. I've tried using constant memory and pointers, I've tried copying to local memory and doing the compute from there. In all those cases, one of the four implementations did not work correctly. The code right now generates inline code.
I know there are faster ways to do DCTs. This is an experimental program to try out different OpenCL techniques and to experiment with multi-gpu and overlapping reads/writes/executes, and to experiment with out of order queues. I got blocked before implementing much of this, trying to work around issues I encountered on one platform or another.
Please investigate the instability and bad code generation on AMD GPU.