Archives Discussions

shingoxlf · ‎09-16-2011

Hi all, I have a optical flow algorithm which is sequential. I recently parallelized it using OpenCL. When I ran the code on nvidia GPU, the speedup is promising. But when I ran it on AMD or Intel CPU, it's worse than the sequential algorithm on CPU, can anyone give me an idea what caused this??

by the, I profile the memory copy time, it takes a large portion of the total time. If the program runs on CPU, the data should be in CPU memory, right? Then why it takes so long to copy?

antzrhere · ‎09-17-2011

You can't expect anyone to figure out why your program is slow when we no nothing about it. Post code. Typically OpenCL will run slower than a C/C++ native implementation because of some overhead and less mature compilers. Ideally on a CPU you want to be using less work items than on a GPU. There's always the possibility that making something parallel is actually less efficient that serial code, depending on the task, but thats something else.

Secondly, use host pinned memory on the CPU, it *should* prevent you from having to copy between buffers on the CPU.

Archives Discussions

Why OpenCL on CPU performance really bad