I found you initialized bufX with vector X, why not set index 0 of X to zero before initialization? like that:
tmp = X;
X = 0;
err = clEnqueueWriteBuffer(queue, bufX, CL_TRUE, 0,
M * sizeof(cl_float), X, 0, NULL, NULL);
X = tmp;
No way around this.
A less expensive way would be to use clEnqueueMapBuffer to get a pointer corresponding to the buffer. If the buffer is already in CPU memory, no copying will be necessary.
so there is no way to access the buffer elements(eg.bufX on device as in the above case) directly.(am i right??)
How will it be possible to access something from host if it is inside device. You need to incur costs of data transfer. Also you need not transfer a big buffer, just to update a small part of it. Sub-buffer concept can be used in such cases. Also OpenCL 2.0 is out, which may have something for you, but it will take quite a time for vendors to claim OpenCL 2.0 capabilities.