7 Replies Latest reply on Aug 21, 2014 1:38 PM by tmaconfire

    clBuildProgram segmentation fault

    yurtesen

      __kernel void testcrash (
              __global double *diout
                     )

      {

      double *d_XP = diout;
      d_XP[0] = 10.0;

      }

       

      I have a super simple kernel. This seems to be crashing the compiler only for GPU devices, it works fine for CPU devices. I can see from OpenCL reference manual that the pointer d_XP should be of type __global also (and this fixes the segmentation fault actually) but forgetting this shouldn't case segmentation faults, instead should create an error message ?

        • Re: clBuildProgram segmentation fault
          sudarshan

          Hi,

          The kernel is trying to do an implicit type-casting of pointers across address spaces (from global to private memory), which is currently illegal.

           

          While it is true that the compiler should give an error message while implicit or explicit pointer type-casting is attempted instead of seg fault, it should be noted that OpenCL is an evolving standard. Future versions of OpenCL are already attempting a unified address space, and such type-casting may not remain illegal in future. It makes little sense from compiler designer's point of view to check and generate an error for a feature that may not be present in future.

            • Re: clBuildProgram segmentation fault
              yurtesen

              Hi,

               

              I am sorry but that is unacceptable and causing a very bad image for AMD. If you care about the company you are working for, you would do something about it.

               

              I can't waste my time looking for random segmentation faults in a program I am developing. I may need to use maybe Intel or NVIDIA devices to be able to properly develop and optimize a program. I told to my colleagues about this first, and they said do not use AMD software because it is buggy. You are loosing developers which is very important for success of your products. Please forward this problem to appropriate people who can fix the issue. It would take seconds to forward it or open an internal bug report for it.

               

              Even more strangely, It is able to compile for CPU devices without an error, who knows if it even works properly.

               

              Even NVIDIA's old implementation is properly detecting this...

              Device 0 log:
              :38:16: error: illegal implicit conversion between two pointers with different address spaces
              double *d_XM = di->XM;
                             ^~~~~~
              :38:9: warning: initializing 'double *' with an expression of type 'double __attribute__((address_space(1)))[1920000]' discards qualifiers
              double *d_XM = di->XM;
                      ^      ~~~~~~

               

              and also Intel's implementation:

              Device 0 log:
              Compilation started
              :38:16: error: illegal implicit conversion between two pointers with different address spaces
              :38:9: error: initializing 'double *' with an expression of type '__global double *' changes address space of pointer
              Compilation failed

              here is AMD OpenCL SDK respose:

              Segmentation fault (core dumped)

               

               

              Thanks,

              Evren

                • Re: clBuildProgram segmentation fault
                  pinform

                  Hi Evren,

                   

                  I agree that until a unified address space becomes a reality, it probably makes sense to output a useful error message--else developers are left in the dark.  Sudarshan identified the potential issue and even noted that the compiler should probably output an error message.  So while opinions across the OpenCL community vary (as does Sudarshan's w.r.t the evolving standard), we are essentially in agreement. .

                   

                  From our side, we have already taken the action within our power: reported this issue to the OpenCL team within AMD.

                   

                  --Prasad

              • Re: clBuildProgram segmentation fault
                tmaconfire

                Can you please show me how you compiled the code to generate this "segmentation fault" error?

                  • Re: clBuildProgram segmentation fault
                    tmaconfire

                    Basically, I need to know which OpenCL version you were using, OpenCL 1.2 or OpenCL 2.0? If the program is compiled on Linux, what commands you used? As OpenCL2.0 already supports generic address space, I cannot reproduce the issue using the up-to-date internal compiler.

                      • Re: clBuildProgram segmentation fault
                        yurtesen

                        Hi tmaconfire, sorry for the delayed answer...(holidays and all other distractions...) I used OpenCL 1.2, we do not have access to your internal compiler?

                        Even then, 'diout' is in global address space. The OpenCL 2.0 pages say "Before referring to the region pointed to, the pointer must be associated with a named address space." about pointers pointing to generic address space. I am not sure if the above code is correct in that case?