If that kernel crashes it's because of errors in your host code.
Some possible errors:
The matrix array is too small for the arguments you pass the kernel.
p might not be initialized, or initialized to a wrong value.(That could apply to q as well if you actually used it for anything beyond repeating the same assignment of 1.0 to the same memory location over and over)
p was initialized correctly, but you forgot to unmap the buffer that contains it after initializing, so the correct value only exists in the CPU cache and not on the GPU.
and of course, any of the arguments could potentially not be set correctly in the kernel
Anyway, you should probably post the part of your host code that initializes the kernel parameters, as well as the code that initializes the opencl buffers you pass in them if none of the suggestions above help.