Hi, I was trying to parallelize the optical flow algorithm using OpenCL.
I have the sequential version running on CPU, and a OpenCL version but using only one thread to run on a AMD CPU.
However, the time spending on kernel for OpenCL is like 10 times more than the sequential one. They are doing exactly the same thing.
Can anyone tell me why the OpenCL is so slow??