I have an application that uses the C++ bindings and is using exceptions for error reporting. The application is a simple NBody simulation, with 2 kernels, one for interaction, and one for forward Euler. The latter runs fine, but the interaction kernel is silently omitted when I target CPU. On GPU it (more or less) runs fine, however the CPU version is practically not launched. No printf has any effect, the kernels seem to finish intantly, and the associated event holds garbage info.
My entire initialize sequence is a giant try block, and nothing throws. I also tried using regular error codes (just to make sure), but all initialization goes fine. I initially used the cl::make_kernel facilitation, but to be able to throw when creating kernels, I dropped their usage. (It is a huge setback, that cl::make_kernel has no default constructor. As a result, they cannot be intialized in try blocks (beause they are destroyed on scope exit), and if I initialize them through pointers, one loses the nice syntax of operator()(...), the very reason one is using it in the first place.) Bottom line is, things work on GPU, and they don't on CPU.
I read the CL_EVENT_COMMAND_EXECUTION_STATUS of the interaction kernel after waiting on the command_queue on which I enqueued it. The value is UINT_MAX (4294967295).
The hpp, cpp and cl files are as simple as they can get. If anybody wants to test, just omit using the "read_particle_file()" function and use a vector of default initialized structs. The point is to get the kernel running. Tried using both Catalyst 14.12, and also 15.4-beta.
Either I am doing something noobish and I'm up for a facepalm, or there is a major issue that the runtime fails to report. Please, gimme some ideas, because I ran out of them.
I changed line where you create buffer to buffer = cl::Buffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, sizeof(particle)*particles.size(), &particles);
And now I got segmentation fault. This whole default context thing is C++ binding is such abomination. I got bitten by it myself previously. It can cause such arcane errors.
Thanks nou.vI have not found the source of the segmentation fault yet. This is what I see:
I have absolutely no idea where things go wrong.
sizeof(particle) on host is 128
sizeof(particle) on device is 104. if you remove __attribute__ ((packed)) it is 128. if you remove it then it is working properly.
I have found the source of the problem, and I must say, if it weren't for the joy of finding the issue, I would explode of anger.
typedef struct __attribute__ ((packed))
Causes the kernel to behave in an unexplainable way. Creating a default initialized particle
Causes the application to bow up with Access violation reading location 0xFFFFFFFF. The workaround is not using the attribute at all, but losing all vector types along the way:
I hope I need not say why this is ugly as hell. I was not expecting to find a bug of this magnitude after this many years of OpenCL in the wild. Or is this a non-conformant way to introduce a struct type?