AESEncryptDecrypt sample kernel, slow on CPU

Discussion created by eklund.n on Oct 22, 2010
Latest reply on Nov 13, 2010 by himanshu.gautam
compared to standard C implementation

Hi. I am somewhat questioning slow execution of OpenCL kernels on CPU.

I have taken the kernel from the sample AESEncryptDecrypt and modified it so that it uses an input char string, not a 2D image. So now it works in 1D index space, workgroup size 64. I also have a regular C file that does AES on a single block at a time, http://dl.dropbox.com/u/4230568/c_aes.c, loop over the input buffer.

The regular C execution maxes one CPU core during entire encryption, the OpenCL kernel maxes all 8 CPU cores (4 real with hyperthreading). But they take roughly the same amount of time to finnish, no matter the input size. Why is that?

How can 8 cores (or 4..) do as little work using opencl as 1 core using ordinary C? Local memory latency? Thread context switching (local_mem_barrier)?

I only compare the actual calculation time, not host setup.