useing singe opencl thread on AMD CPU running slower than sequential code

Hi, I was trying to parallelize the optical flow algorithm using OpenCL.

I have the sequential version running on CPU, and a OpenCL version but using only one thread to run on a AMD CPU.

However, the time spending on kernel for OpenCL is like 10 times more than the sequential one. They are doing exactly the same thing.

Can anyone tell me why the OpenCL is so slow??