cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bergmann
Adept I

Segfault and incorrect values in deeply nested for-loop kernel

Hi,

I have an OpenCL kernel that consists of deeply nested for-loops interleaved with if statements, the kernel contains #ifdef's to control how many levels of for-loops to go down. When compiling and executing the kernel on an AMD Radeon HD 7970 under CentOS 6.5 the program segfaults if I attempt to compile and execute the kernel with 8 nested for-loops each contained inside a conditional if statement.

I've looked around on the AMD developer forums as well as online and have read on older posts that the AMD OpenCL compiler has a limit to the number of nested for loops with conditionals due to it performing loop unswitching. Is this still the case and is this what is causing the problems on my machine? If so is there a way for me to disable loop unswitching which might allow my kernel to compile and execute? Or can I not nest a kernel so deeply.

I really just want to find out more about the limits on the AMD OpenCL compiler in regards to deeply nested for loops and conditionals. Unfortunately I cannot provide any of the code that I am working on (it's source code for my company's product), but I can give more details of my setup.

Thanks!

0 Likes
1 Solution

Hi,

Is there any update? Meanwhile, I did some experiments and here is my observation:

Added following line above the below comments:

lnKsum[7]= lnKsum[6] + lnKsum[7];

// cant access lnKsum[7] here

// for some reason...


Note: To avoid the segfault, I changed size of the "char opt[200]" array.


OS: Ubuntu 14.04LTS (64bit), Driver: fglrx 14.40

1) Selected device(CPU): AMD A10-6800K APU

Kernel compilation and running was successful.

2) Selected device (GPU):  Devastator  AMD Radeon HD 8670D

Kernel compilation and running was successful.

3) Selected device (GPU):  Capeverde  AMD Radeon HD 7770

Seg fault during kernel compilation [independent of the above line]

Now, if optimization flag "-O0" or "-O1" was passed during clBuildProgram [i.e. -O0 or -O1 was added in "opt" string], the kernel compilation and running was successful.

Did you try with these optimization flags? If not, please try and share your findings

I'll try to manage a HD 7970 to test the same on that card. Please let me know the driver version you've used.

Regards,

View solution in original post

0 Likes
11 Replies
dipak
Big Boss

Hi bergmann,

Thanks for reporting this. Is the issue reproducible under other setups say with other GPUs/drivers/OSs?

As you've said, some older threads suggest that it may be a compiler limitation. Please can you share those links? In that case, I need to consult with the compiler team.

Regards,

0 Likes

Hi dipak,

The kernel works correctly on Ubuntu 12.04 with an NVIDIA Tesla K20c. It is also used on linux machines with NVIDIA GTX 560s. I had a look to find some of the links I read and this was the only one I could find:

nested-if performance penalty if no else

I've spoken with my manager and I can provide the Kernel which the problem is in.

Volume AB

Regards,

Dan

0 Likes

Thanks for posting the kernel code. The kernel is very long and it depends on other files also. If possible please try to post a similar but simpler test case that manifests the same problem. I'll try to reproduce it at my end.

Regards,

0 Likes
bergmann
Adept I

I don't think I can post much more code wise but I will check with my manager tomorrow. However from looking at the kernel, specifically the number of nested loops, do you think this would cause the AMD compiler a problem? I have other kernels which are just as long (but none with such deeply nested loops) and which don't cause any problems.

Perhaps I should make a sample kernel and program to test which I can fully provide for you?

Regards

Dan

0 Likes

Yes, a sample test case will be very helpful so that I can test it myself and if needed, can forward the problem to concern team. As you said, your program is compiling fine, but crashing during execution, it'll be difficult for me to debug until I'm able to run the code.

Note: I checked with a guy from compiler team but found no such limitations regarding nested loops/conditions.

Regards,

0 Likes
bergmann
Adept I

Hi,

I am working on an example program which should hopefully replicate the problem. I hope to provide it tomorrow for you.

Regards,

Dan

0 Likes
bergmann
Adept I

Hi,

I have constructed an example which uses the original kernel from my code. It supplies the kernel with real data and it uses the same parameters as in my application. Interestingly enough this example code segfaults on my NVIDIA machine in the exact same way as on the AMD machine despite there being a difference between the real program. I have narrowed down the segfault which seems to be caused when accessing "lnKsum[7]" between lines 397 and 404 (the very bottom of the nested loops) however there doesn't seem to be a problem accessing it anywhere else. I realise this suggests it is not a problem with the AMD compiler but with my kernel and code, however perhaps you can see something I have missed.

You can download my code from using either git or the download as zip button:

https://github.com/brgmnn/amd-opencl

Regards,

Dan

Hi,

Is there any update? Meanwhile, I did some experiments and here is my observation:

Added following line above the below comments:

lnKsum[7]= lnKsum[6] + lnKsum[7];

// cant access lnKsum[7] here

// for some reason...


Note: To avoid the segfault, I changed size of the "char opt[200]" array.


OS: Ubuntu 14.04LTS (64bit), Driver: fglrx 14.40

1) Selected device(CPU): AMD A10-6800K APU

Kernel compilation and running was successful.

2) Selected device (GPU):  Devastator  AMD Radeon HD 8670D

Kernel compilation and running was successful.

3) Selected device (GPU):  Capeverde  AMD Radeon HD 7770

Seg fault during kernel compilation [independent of the above line]

Now, if optimization flag "-O0" or "-O1" was passed during clBuildProgram [i.e. -O0 or -O1 was added in "opt" string], the kernel compilation and running was successful.

Did you try with these optimization flags? If not, please try and share your findings

I'll try to manage a HD 7970 to test the same on that card. Please let me know the driver version you've used.

Regards,

0 Likes

Hi,

So I tried the changes you suggested and it fixes the problem with the example project I provided. I changed opt[200] to opt[400] and can confirm that it there is no segfault when passing -O0 or -O1. I also tried these changes to the project I'm working on and found that with -O1 it still segfaults but -O0 seems to fix the segfault! I'm using driver fglrx-14.41 on Cent OS 6.5.

However the downside is that this does result in significantly longer execution times.

I will mark the question as answered as this fixes the problem I asked, but it's still problematic using AMD.

Thanks for the help!

Dan

0 Likes

Hi Dan,

Thanks for the confirmation. This may be a compiler optimization issue. I know performance will be downgraded without optimization but at this point, I guess, this the only workaround. I may need to file an internal bug report against this issue. Hopefully this problem will be solved in future drivers.

Regards,

0 Likes

OK thanks again for all the assistance on this. Hopefully we can use higher optimisation settings with future drivers.

Best Regards,

Dan

0 Likes