cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bergmann
Adept I

Segfault and incorrect values in deeply nested for-loop kernel

Hi,

I have an OpenCL kernel that consists of deeply nested for-loops interleaved with if statements, the kernel contains #ifdef's to control how many levels of for-loops to go down. When compiling and executing the kernel on an AMD Radeon HD 7970 under CentOS 6.5 the program segfaults if I attempt to compile and execute the kernel with 8 nested for-loops each contained inside a conditional if statement.

I've looked around on the AMD developer forums as well as online and have read on older posts that the AMD OpenCL compiler has a limit to the number of nested for loops with conditionals due to it performing loop unswitching. Is this still the case and is this what is causing the problems on my machine? If so is there a way for me to disable loop unswitching which might allow my kernel to compile and execute? Or can I not nest a kernel so deeply.

I really just want to find out more about the limits on the AMD OpenCL compiler in regards to deeply nested for loops and conditionals. Unfortunately I cannot provide any of the code that I am working on (it's source code for my company's product), but I can give more details of my setup.

Thanks!

0 Likes
1 Solution

Hi,

Is there any update? Meanwhile, I did some experiments and here is my observation:

Added following line above the below comments:

lnKsum[7]= lnKsum[6] + lnKsum[7];

// cant access lnKsum[7] here

// for some reason...


Note: To avoid the segfault, I changed size of the "char opt[200]" array.


OS: Ubuntu 14.04LTS (64bit), Driver: fglrx 14.40

1) Selected device(CPU): AMD A10-6800K APU

Kernel compilation and running was successful.

2) Selected device (GPU):  Devastator  AMD Radeon HD 8670D

Kernel compilation and running was successful.

3) Selected device (GPU):  Capeverde  AMD Radeon HD 7770

Seg fault during kernel compilation [independent of the above line]

Now, if optimization flag "-O0" or "-O1" was passed during clBuildProgram [i.e. -O0 or -O1 was added in "opt" string], the kernel compilation and running was successful.

Did you try with these optimization flags? If not, please try and share your findings

I'll try to manage a HD 7970 to test the same on that card. Please let me know the driver version you've used.

Regards,

View solution in original post

0 Likes
11 Replies