I have an OpenCL kernel that consists of deeply nested for-loops interleaved with if statements, the kernel contains #ifdef's to control how many levels of for-loops to go down. When compiling and executing the kernel on an AMD Radeon HD 7970 under CentOS 6.5 the program segfaults if I attempt to compile and execute the kernel with 8 nested for-loops each contained inside a conditional if statement.
I've looked around on the AMD developer forums as well as online and have read on older posts that the AMD OpenCL compiler has a limit to the number of nested for loops with conditionals due to it performing loop unswitching. Is this still the case and is this what is causing the problems on my machine? If so is there a way for me to disable loop unswitching which might allow my kernel to compile and execute? Or can I not nest a kernel so deeply.
I really just want to find out more about the limits on the AMD OpenCL compiler in regards to deeply nested for loops and conditionals. Unfortunately I cannot provide any of the code that I am working on (it's source code for my company's product), but I can give more details of my setup.