4 Replies Latest reply on May 2, 2015 6:45 AM by Meteorhead

    Kernel launch is silently omitted on CPU




      I have an application that uses the C++ bindings and is using exceptions for error reporting. The application is a simple NBody simulation, with 2 kernels, one for interaction, and one for forward Euler. The latter runs fine, but the interaction kernel is silently omitted when I target CPU. On GPU it (more or less) runs fine, however the CPU version is practically not launched. No printf has any effect, the kernels seem to finish intantly, and the associated event holds garbage info.


      My entire initialize sequence is a giant try block, and nothing throws. I also tried using regular error codes (just to make sure), but all initialization goes fine. I initially used the cl::make_kernel facilitation, but to be able to throw when creating kernels, I dropped their usage. (It is a huge setback, that cl::make_kernel has no default constructor. As a result, they cannot be intialized in try blocks (beause they are destroyed on scope exit), and if I initialize them through pointers, one loses the nice syntax of operator()(...), the very reason one is using it in the first place.) Bottom line is, things work on GPU, and they don't on CPU.


      I read the CL_EVENT_COMMAND_EXECUTION_STATUS of the interaction kernel after waiting on the command_queue on which I enqueued it. The value is UINT_MAX (4294967295).

      The hpp, cpp and cl files are as simple as they can get. If anybody wants to test, just omit using the "read_particle_file()" function and use a vector of default initialized structs. The point is to get the kernel running. Tried using both Catalyst 14.12, and also 15.4-beta.


      Either I am doing something noobish and I'm up for a facepalm, or there is a major issue that the runtime fails to report. Please, gimme some ideas, because I ran out of them.

        • Re: Kernel launch is silently omitted on CPU

          I changed line where you create buffer to buffer = cl::Buffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, sizeof(particle)*particles.size(), &particles[0]);


          And now I got segmentation fault. This whole default context thing is C++ binding is such abomination. I got bitten by it myself previously. It can cause such arcane errors.

            • Re: Kernel launch is silently omitted on CPU

              Thanks nou.vI have not found the source of the segmentation fault yet. This is what I see:


              • There is a vector of structs on host, sizeof = 104.
              • The struct on device is sizeof = 104, thanks to the attribute.
              • I launch as many threads as there are structs in the buffer.
              • Every thread indexes into the buffer using the thread id only. No fancy arithmetics.
              • The only pointers I'm using are the adress of variables on the stack.


              I have absolutely no idea where things go wrong.


            • Re: Kernel launch is silently omitted on CPU

              I have found the source of the problem, and I must say, if it weren't for the joy of finding the issue, I would explode of anger.


              typedef struct __attribute__ ((packed))
                   double mass;
                   double3 pos;
                   double3 v;
                   double3 f;


              Causes the kernel to behave in an unexplainable way. Creating a default initialized particle


              particle my_particle;


              Causes the application to bow up with Access violation reading location 0xFFFFFFFF. The workaround is not using the attribute at all, but losing all vector types along the way:


              typedef struct
                   double mass;
                   double posX;
                   double posZ;
                   double posX;
                   double posW;
                   double vX;
                   double vY;
                   double vZ;
                   double vW;
                   double fX;
                   double fY;
                   double fZ;
                   double fW;


              I hope I need not say why this is ugly as hell. I was not expecting to find a bug of this magnitude after this many years of OpenCL in the wild. Or is this a non-conformant way to introduce a struct type?