11 Replies Latest reply on Oct 15, 2014 11:28 AM by bergmann

    Segfault and incorrect values in deeply nested for-loop kernel

    bergmann

      Hi,

       

      I have an OpenCL kernel that consists of deeply nested for-loops interleaved with if statements, the kernel contains #ifdef's to control how many levels of for-loops to go down. When compiling and executing the kernel on an AMD Radeon HD 7970 under CentOS 6.5 the program segfaults if I attempt to compile and execute the kernel with 8 nested for-loops each contained inside a conditional if statement.

       

      I've looked around on the AMD developer forums as well as online and have read on older posts that the AMD OpenCL compiler has a limit to the number of nested for loops with conditionals due to it performing loop unswitching. Is this still the case and is this what is causing the problems on my machine? If so is there a way for me to disable loop unswitching which might allow my kernel to compile and execute? Or can I not nest a kernel so deeply.

       

      I really just want to find out more about the limits on the AMD OpenCL compiler in regards to deeply nested for loops and conditionals. Unfortunately I cannot provide any of the code that I am working on (it's source code for my company's product), but I can give more details of my setup.

       

      Thanks!

        • Re: Segfault and incorrect values in deeply nested for-loop kernel
          dipak

          Hi bergmann,

          Thanks for reporting this. Is the issue reproducible under other setups say with other GPUs/drivers/OSs?

          As you've said, some older threads suggest that it may be a compiler limitation. Please can you share those links? In that case, I need to consult with the compiler team.

           

          Regards,

          • Re: Segfault and incorrect values in deeply nested for-loop kernel
            bergmann

            I don't think I can post much more code wise but I will check with my manager tomorrow. However from looking at the kernel, specifically the number of nested loops, do you think this would cause the AMD compiler a problem? I have other kernels which are just as long (but none with such deeply nested loops) and which don't cause any problems.

             

            Perhaps I should make a sample kernel and program to test which I can fully provide for you?

             

            Regards

            Dan

              • Re: Segfault and incorrect values in deeply nested for-loop kernel
                dipak

                Yes, a sample test case will be very helpful so that I can test it myself and if needed, can forward the problem to concern team. As you said, your program is compiling fine, but crashing during execution, it'll be difficult for me to debug until I'm able to run the code.

                Note: I checked with a guy from compiler team but found no such limitations regarding nested loops/conditions.

                 

                Regards,

              • Re: Segfault and incorrect values in deeply nested for-loop kernel
                bergmann

                Hi,

                 

                I am working on an example program which should hopefully replicate the problem. I hope to provide it tomorrow for you.

                 

                Regards,

                Dan

                • Re: Segfault and incorrect values in deeply nested for-loop kernel
                  bergmann

                  Hi,

                   

                  I have constructed an example which uses the original kernel from my code. It supplies the kernel with real data and it uses the same parameters as in my application. Interestingly enough this example code segfaults on my NVIDIA machine in the exact same way as on the AMD machine despite there being a difference between the real program. I have narrowed down the segfault which seems to be caused when accessing "lnKsum[7]" between lines 397 and 404 (the very bottom of the nested loops) however there doesn't seem to be a problem accessing it anywhere else. I realise this suggests it is not a problem with the AMD compiler but with my kernel and code, however perhaps you can see something I have missed.

                   

                  You can download my code from using either git or the download as zip button:

                  https://github.com/brgmnn/amd-opencl

                   

                  Regards,

                  Dan

                  1 of 1 people found this helpful
                    • Re: Segfault and incorrect values in deeply nested for-loop kernel
                      dipak

                      Hi,

                      Is there any update? Meanwhile, I did some experiments and here is my observation:

                       

                      Added following line above the below comments:

                      lnKsum[7]= lnKsum[6] + lnKsum[7];

                      // cant access lnKsum[7] here

                      // for some reason...


                      Note: To avoid the segfault, I changed size of the "char opt[200]" array.


                      OS: Ubuntu 14.04LTS (64bit), Driver: fglrx 14.40

                       

                      1) Selected device(CPU): AMD A10-6800K APU

                      Kernel compilation and running was successful.

                       

                      2) Selected device (GPU):  Devastator  AMD Radeon HD 8670D

                      Kernel compilation and running was successful.

                       

                      3) Selected device (GPU):  Capeverde  AMD Radeon HD 7770

                      Seg fault during kernel compilation [independent of the above line]

                       

                      Now, if optimization flag "-O0" or "-O1" was passed during clBuildProgram [i.e. -O0 or -O1 was added in "opt" string], the kernel compilation and running was successful.

                       

                      Did you try with these optimization flags? If not, please try and share your findings

                      I'll try to manage a HD 7970 to test the same on that card. Please let me know the driver version you've used.

                       

                      Regards,