When I try to compile certain OpenCL code on a machine with R9 290 installed running a Catalyst 14.12 driver, the program crashes with stack overflow in amdocl64.dll when calling clBuildProgram.
The same code compiles just fine on a machine running HD 7970 with the same driver.
Below is the minimum kernel causing this to happen:
__kernel void test(__global float* srcBase, __global float* dstBase, int width, int height)
{
int x = get_global_id(0);
if(x < width)
{
int y = get_global_id(1);
if(y < height)
{
__global float* dst = dstBase + x + y * width;
__global const float* head = srcBase + y * width;
int offsetLeft = x > 0? -1: 0;
float leftFactor = x > 0? 1: 0;
float curAverage = leftFactor * (head[offsetLeft]);
*dst = curAverage;
dst += width;
}
}
}
Thanks for reporting this and providing the reproducible test case. We'll check and get back to you.
Hi dipak.
Did you manage to reproduce the issue? Is there any news on the problem?
Thanks, Tim
Yes, I'm able to reproduce the issue using a sample test project. However, when I'm trying to compile the kernel code using CodeXL, its working fine. Another interesting point is the test code is working fine for following cases:
1) Disable the optimization by passing "-O0" or "-cl-opt-disable" during clBuildProgram()
2) Building as OpenCL 2.0 kernel code by passing "-cl-std=CL2.0"
during clBuildProgram() [with/without optimization]
I guess its a compiler bug. I've asked someone for clarification. If needed, I'll file an bug report against it. Meanwhile, you may try the above workarounds and let me know your findings.
Regards,
Compilation works with either "-O0" or "-cl-std=CL2.0" or both, however, in these cases application still fails with stack overflow in amdocl64.dll when attempting to call clBuildProgram after clCreateProgramWithBinary with the binary produced after compilation. As I have explained in another post, we are using offline compilation and do not ship our apps with OpenCL sources.
Also, when "-fno-bin-llvmir" flag is passed, compilation results in this error:
An error with the ELF object was encountered.
Error while BRIG Codegen phase: compilation error
Merry Christmas, Dipak.
Any progress on the issue?
Merry Christmas.
I've filed an internal bug report against the issue. If I get any update, I'll share with you.
Regards,
Hi Tim,
Good news. Recently I got an update that this issue has been fixed in latest internal driver build. Hope the fix will be available to public version soon.
Regards,
Thanks Dipak.
Hopefully other problems recently reported by me will be fixed soon too.
We have just found out that the same kernel causes compilation to crash with Catalyst 14.9 driver too when GPU_FORCE_64BIT_PTR is set to 1 on both machines (with R9 290 and HD 7970).
Timofey Chistyakov - checking in, relative to "We have just found out that the same kernel causes compilation to crash with Catalyst 14.9 driver too when GPU_FORCE_64BIT_PTR is set to 1 on both machines (with R9 290 and HD 7970)."
Although there are no specific reproducibility instructions, is this the same problem that started this thread, or something different? If it's different, I'll split this into a new topic. If it's an instance of the same issue, I'll ignore, since Dipak reported the issue is fixed and should see the light of day soon.
Hi jtrudeau,
the reproducibility instructions are pretty similar to those mentioned in the original post: just try compile the kernel (the source is posted above) on Catalyst 14.9 with GPU_FORCE_64BIT_PTR set to 1 on a machine with Radeon R9 290 or HD 7970 installed (did not try other cards). Doing that results in stack overflow in amdocl64.dll.
I don't know whether this behaviour is caused by the same error that is presumably fixed in an internal version of driver as I don't know what the error was, neither I have access to this version of driver to test and verify directly.
OK. I'll leave that report in this thread, and pass it on to the team. Thanks for the feedback.
As of Catalyst 15.4 beta the error still persists.
Thank timchist, very useful article, thank you for sharing the information.
Catalyst 15.5 beta: the error is still reproducible.
The issue is no longer reproducible in Catalyst 15.7.
Regards,