6 Replies Latest reply on Nov 13, 2010 6:30 PM by himanshu.gautam

    AESEncryptDecrypt sample kernel, slow on CPU

      compared to standard C implementation

      Hi. I am somewhat questioning slow execution of OpenCL kernels on CPU.

      I have taken the kernel from the sample AESEncryptDecrypt and modified it so that it uses an input char string, not a 2D image. So now it works in 1D index space, workgroup size 64. I also have a regular C file that does AES on a single block at a time, http://dl.dropbox.com/u/4230568/c_aes.c, loop over the input buffer.

      The regular C execution maxes one CPU core during entire encryption, the OpenCL kernel maxes all 8 CPU cores (4 real with hyperthreading). But they take roughly the same amount of time to finnish, no matter the input size. Why is that?

      How can 8 cores (or 4..) do as little work using opencl as 1 core using ordinary C? Local memory latency? Thread context switching (local_mem_barrier)?

      I only compare the actual calculation time, not host setup.