Does AMD app SDK OpenCL on CPU compiler optimized for the X64 platform?

Discussion created by zhuzxy on Jun 23, 2011
Latest reply on Jun 27, 2011 by MicahVillmow


    I have a question regarding on the OpenCL on CPU for AMD APP.

    I have made a simple test OpenCL code, which just did a lot of add in a loop to test the OpenCL performance. The CL code only include 1 work group and 1 work item, and I also made an C reference code on CPU doing the same thing. I assume the performance should be similar, but what I found is when using reference code on CPU and select the platform to be X64 , the reference code result is much faster( about 50%) than OpenCL on CPU. And if I select the reference code to run on win32 platform, the result is some similar with OpenCL on CPU.

  My question is does the AMD APP OpenCL compiler optimized for the X64 platform or only for W32 platform? And how can I verify it. Can anyone give me some instructions or suggestions?

I used AMD kernel analyzer to analysis my CL code and generate the X86 assembly instructions as below.

// only paste the vector add part of the code.

// the CL source code is vec[c] = vec[a] + vec

// where vec is char16 data type

 movdqa (%ebx), %xmm0
 addl $16, %ebx
 paddb (%esi), %xmm0
 addl $16, %esi
 movdqa %xmm0, (%ebp)
 addl $16, %ebp
 decl %edx
 jne LBB0_5
 jmp LBB0_3