You can't expect anyone to figure out why your program is slow when we no nothing about it. Post code. Typically OpenCL will run slower than a C/C++ native implementation because of some overhead and less mature compilers. Ideally on a CPU you want to be using less work items than on a GPU. There's always the possibility that making something parallel is actually less efficient that serial code, depending on the task, but thats something else.
Secondly, use host pinned memory on the CPU, it *should* prevent you from having to copy between buffers on the CPU.