10 Replies Latest reply on Sep 21, 2012 2:33 PM by yurtesen

    Tahiti compiler crash

    inducer77

      Hi there,

       

      the attached PyOpenCL script (really, just the embedded CL C) causes the 12.8 OpenCL compiler to segfault when targeting Tahiti.

       

      Any help would be much appreciated.

       

      Thanks!

      Andreas

        • Re: Tahiti compiler crash
          Wenju

          Hi Andreas,

          It will be compiled failed if the kernel is too complex. I compiled following kernel by using AMD APP KernelAnalyzer,

          #define myfloat4 float4
          #define myfloat float
          #define NqP 64
          #define PADP 64
          #define BSIZEP 32


          __kernel void code(const int K,
                                     __global const int      * restrict galnums,
                                     __global const myfloat4 * restrict u
                                     )
          {
            /* advects (u.grad)(u,v,w,T) */
            /* shared register for 'r,s' plane */
            __local myfloat LPVL[NqP][NqP+PADP];

            volatile __local myfloat Lw[NqP];

            // u[:][j][i] -> uk[:]

            __local myfloat4  Lu1[NqP][NqP][NqP+PADP]; // use shared ?

            const unsigned int e = get_group_id(0);
            const unsigned int i = get_local_id(0);
            const unsigned int j = get_local_id(1);
            unsigned int k;

            int m;


            for(k=0;k<NqP;++k) {

              const int id = e*BSIZEP+k*NqP*NqP+j*NqP+i;

              const int gid = galnums[id];

              const myfloat sc = (Lw[i]*Lw[j]*Lw[k]);

              myfloat4 uk = 0;

              if(gid>=0)
                uk = sc*u[gid];

              for(m=0;m<NqP;++m) Lu1[m][j][i] += LPVL[m][k]*uk;
            }

            barrier(CLK_LOCAL_MEM_FENCE);

            for(k=0;k<NqP;++k) {
              myfloat4 tmp1 = 0;

              for(m=0;m<NqP;++m) tmp1 += LPVL[j][m]*Lu1[k][m][i];

              barrier(CLK_LOCAL_MEM_FENCE);
              Lu1[k][j][i] = tmp1;
            }

          }

           

          I got failed, and I don't know whether the kernel is correct.

            • Re: Tahiti compiler crash
              inducer77

              Thanks for your answer. That said, to me the only acceptable outcomes for a compiler are 1) a working binary or 2) an intelligible error message. Note how a segmentation fault does not fall into either of these categories.

              • Re: Tahiti compiler crash
                drallan

                Hi inducer77,

                 

                Your kernel  failed to compile because it requires more than 2,100,000 bytes of local memory per workgroup

                 

                GPUs divide their local memory between the compute units, Tahiti has 64K bytes on each of 32 CUs, for

                a total of. 2,097,152 bytes. Each kernel (workgroup) can use a maximum of 32768 bytes but if 2 kernels are running on the same CU, then all 64K will be used.

                 

                The kernel compiles fine if the local memory is within spec. I got the error message:

                 

                calclCompile failedError: Creating kernel code failed! Using the Kernel Analyzer.

                1 of 1 people found this helpful
              • Re: Tahiti compiler crash
                inducer77

                Dear drallan,

                 

                thanks for your answer. It would still be great if the compiler (called from CL) didn't crash, but just said what its problem is...

                 

                Andreas

                  • Re: Tahiti compiler crash
                    yurtesen

                    Are you sure that the segmentation fault is caused when compiling the kernel or if your program is trying to run a nonexistent program binary? (due to compilation error which was probably undetected). It might be a downside of using python perhaps?

                     

                    In the past I had a kernel which caused an actual crash at program build stage and it also caused a crash in kernel analyzer.  (I believe the same code is used for building programs on both cases, even between Linux and Windows)

                     

                    When I copy/pasted the kernel into one of my C/C++ programs I get the same error as drallan logged from the build process:

                     

                    calclCompile failedError: Creating kernel code failed!

                     

                    and also the program build returns an error code CL_BUILD_PROGRAM_FAILURE and no segmentation fault occurs since I have an if statement which exits the program if kernel cant be built.

                     

                    Sure enough, the error message could have been more sophisticated...

                      • Re: Tahiti compiler crash
                        inducer77

                        I am quite sure that the Python wrapper checks for all possible error conditions (and reports them). I wrote the thing, after all.

                         

                        Now this may come down to a difference in driver versions--I'm using version 12.8 of the driver package as shipped by Debian, here:

                         

                        http://packages.debian.org/experimental/amd-opencl-icd

                         

                        Andreas

                          • Re: Tahiti compiler crash
                            yurtesen

                            To be honest, I dont trust python that much, when there is a problem, there is often python also hehe

                             

                            I would think that the compilation is independent of the driver. Did you not install amd-app-sdk 2.7 ?

                             

                            Try the attached sample where is test_kern.cpp and kernel.cl:

                             

                            $ g++ test_kern.cpp -lOpenCL

                            $ ./a.out

                            ...

                            blah blah

                            ...

                            Building OpenCL executable...

                            Error: Failed to build program executable!

                            -11

                            calclCompile failedError: Creating kernel code failed!

                             

                            $

                             

                            If your platform and device are not the first, you might have to edit some variables in the code (its a quick hack so).

                              • Re: Tahiti compiler crash
                                yurtesen

                                Are you able to see the error message with my code?

                                  • Re: Tahiti compiler crash
                                    inducer77

                                    Thanks for your help, but given that you're name-calling other people's code for causing trouble, I'm surprised by your code's brittleness. Here's how it fails:

                                     

                                    Available platforms: 2

                                    -------------------------------------------------------------------------------

                                    Platform ID - Name : 0* - Intel(R) OpenCL

                                    Platform Profile: FULL_PROFILE

                                    Platform Version: OpenCL 1.1 LINUX

                                    Platform Vendor: Intel(R) Corporation

                                    Platform Extensions: Intel(R) Corporation

                                    Platform ID - Name : 1  - AMD Accelerated Parallel Processing

                                    Platform Profile: FULL_PROFILE

                                    Platform Version: OpenCL 1.2 AMD-APP (938.2)

                                    Platform Vendor: Advanced Micro Devices, Inc.

                                    Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

                                    -------------------------------------------------------------------------------

                                    * - Selected

                                    Error: Failed to get device IDs!

                                     

                                    (and even when convinced to pick the AMD platform, it would die with the same error, and even then it would likely pick just any GPU, and thereby not the right one--the machine in question has a Tahiti and a Cayman, with the Cayman as device 0.)

                                      • Re: Tahiti compiler crash
                                        yurtesen

                                        Like I said, you should select the right platform by changing some variables in the code (and if you did this, I dont understand why you post a different output with wrong platform selected, did you forget to recompile?). You should have changed the platform ID to 1 in the beginning of the code. Then recompile...

                                        #define PLATFORM_ID 1

                                        As you can see from the output, it chose platform with ID 0 (defined by *) which is Intel OpenCL and therefore does not contain any GPUs. Therefore, it cant get device IDs and it is correctly exiting the program without a segmentation fault

                                         

                                        As an example of a test case with NVIDIA and AMD platforms (from a machine with Nvidia GPUs only):

                                         

                                        Available platforms: 2
                                        -------------------------------------------------------------------------------
                                        Platform ID - Name : 0* - NVIDIA CUDA
                                        Platform Profile: FULL_PROFILE
                                        Platform Version: OpenCL 1.1 CUDA 4.2.1
                                        Platform Vendor: NVIDIA Corporation
                                        Platform Extensions: NVIDIA Corporation
                                        Platform ID - Name : 1  - AMD Accelerated Parallel Processing
                                        Platform Profile: FULL_PROFILE
                                        Platform Version: OpenCL 1.2 AMD-APP (923.1)
                                        Platform Vendor: Advanced Micro Devices, Inc.
                                        Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
                                        -------------------------------------------------------------------------------
                                        * - Selected
                                        
                                        Available devices: 2 (can use only 1 for now!)
                                        -------------------------------------------------------------------------------
                                        1* - NVIDIA Corporation Tesla M2050...
                                        2  - NVIDIA Corporation Tesla M2050...
                                        -------------------------------------------------------------------------------
                                        * - Selected
                                        Device preferred float vector size 1
                                        
                                        Trying to use OpenCL source file kernel.cl
                                        Mapped the source file (1163 bytes) to 0x7f0ce0e0c000
                                        Building OpenCL executable...
                                        Error: Failed to build program executable!
                                        -42
                                        ptxas error   : Entry function 'code' uses too much shared data (0x808100 bytes, 0xc000 max)
                                        

                                         

                                        I see now that NVIDIAs error is much nicer...

                                         

                                        On this specific machine there are no AMD GPUs so I set:

                                        #define PLATFORM_ID 1
                                        #define DEV_TYPE "CPU"
                                        

                                         

                                        Available platforms: 2
                                        -------------------------------------------------------------------------------
                                        Platform ID - Name : 0  - NVIDIA CUDA
                                        Platform Profile: FULL_PROFILE
                                        Platform Version: OpenCL 1.1 CUDA 4.2.1
                                        Platform Vendor: NVIDIA Corporation
                                        Platform Extensions: NVIDIA Corporation
                                        Platform ID - Name : 1* - AMD Accelerated Parallel Processing
                                        Platform Profile: FULL_PROFILE
                                        Platform Version: OpenCL 1.2 AMD-APP (923.1)
                                        Platform Vendor: Advanced Micro Devices, Inc.
                                        Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
                                        -------------------------------------------------------------------------------
                                        * - Selected
                                        Error: Failed to get device IDs!
                                        -bash-4.1$ g++ test_kern.cpp -lOpenCL -I$OPENCL_INCLUDE
                                        -bash-4.1$ ./a.out
                                        
                                        Available platforms: 2
                                        -------------------------------------------------------------------------------
                                        Platform ID - Name : 0  - NVIDIA CUDA
                                        Platform Profile: FULL_PROFILE
                                        Platform Version: OpenCL 1.1 CUDA 4.2.1
                                        Platform Vendor: NVIDIA Corporation
                                        Platform Extensions: NVIDIA Corporation
                                        Platform ID - Name : 1* - AMD Accelerated Parallel Processing
                                        Platform Profile: FULL_PROFILE
                                        Platform Version: OpenCL 1.2 AMD-APP (923.1)
                                        Platform Vendor: Advanced Micro Devices, Inc.
                                        Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
                                        -------------------------------------------------------------------------------
                                        * - Selected
                                        
                                        Available devices: 1 (can use only 1 for now!)
                                        -------------------------------------------------------------------------------
                                        1* - GenuineIntel Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz...
                                        -------------------------------------------------------------------------------
                                        * - Selected
                                        Device preferred float vector size 4
                                        
                                        Trying to use OpenCL source file kernel.cl
                                        Mapped the source file (1163 bytes) to 0x7f1411a2c000
                                        Building OpenCL executable...
                                        Error: Failed to create compute kernel!
                                        -46