Why OpenCL on CPU performance really bad

Discussion created by shingoxlf on Sep 16, 2011
Latest reply on Sep 17, 2011 by antzrhere

Hi all, I have a optical flow algorithm which is sequential. I recently parallelized it using OpenCL. When I ran the code on nvidia GPU, the speedup is promising. But when I ran it on AMD or Intel CPU, it's worse than the sequential algorithm on CPU, can anyone give me an idea what caused this?? 

by the, I profile the memory copy time, it takes a large portion of the total time. If the program runs on CPU, the data should be in CPU memory, right? Then why it takes so long to copy?