3 Replies Latest reply on Sep 6, 2011 8:22 PM by MicahVillmow

    device=GPU compiler hangs

    CaptGreg
      program compiles and runs with device=CPU, but opencl compiler hangs with device=GPU

      A non trivial OpenCL 7000 line OpenCL program compiles and executes correctly using the CPU as a target.

       cl::Context context(CL_DEVICE_TYPE_CPU, ...
      (everything OK, program builds and executes correctly)

      The OpenCL compiler hangs when we try compiling the same code specificing the GPU as thte target device

      cl::Context context(CL_DEVICE_TYPE_GPU, ...

      The primary environment is

         Intel i7 980 CPU (Dell XPS 9100)
         Sapphire HD 6970
         either Ubuntu 10.04 LTS AMD64 or WIN7 VS 2008
         Catalyst 11_8
         AMD APP 2.5

      The choice of OS makes no differnce:  device=CPU works but with device=GPU, the OpenCL compiler hangs.

      We have tried Catalyst 11_5, 11_6, 11_7, 11_8 and APP 2.4 as well as 2.5 with identical results in both the Ubuntu and WIN7 environments.

      We have tried using gnu gcc in C99 mode (-std=c99) to search for questionable or bad syntax.

      Suggestions?

        • device=GPU compiler hangs
          MicahVillmow
          This is a known issue. Large OpenCL programs on the GPU can cause exponential increase in compilation time. The only known work-around is to use smaller kernels.
            • device=GPU compiler hangs
              CaptGreg

              What is the recommended work around?  How do we minimize compile time?

              We need to understand the guide lines we can follow if we have lengthy pieces of code that needs to be executed.

              Are there compile time trade offs between many short subroutines verses fewer longer subroutines?

              Should we favor local variables stored in a structure, pass the structure to subroutines as a single argument, or pass variables directly as subroutine multiple parameters?  What compiles faster?

            • device=GPU compiler hangs
              MicahVillmow
              On the GPU, function calls are not supported, so everything gets inlined, which causes the problems. The problem isn't how things are coded, but the fact that after everything gets inlined, the program itself can be extremely large. While our compiler pushes the inlining as far back as possible, there are still cases that will cause exponential increase in compile time, which is what you are seeing. Usually the increase is caused by the compiler using all of the memory and swapping to the hard drive.