3 Replies Latest reply on Jun 27, 2011 8:49 AM by MicahVillmow

    Does AMD app SDK OpenCL on CPU compiler optimized for the X64 platform?



          I have a question regarding on the OpenCL on CPU for AMD APP.

          I have made a simple test OpenCL code, which just did a lot of add in a loop to test the OpenCL performance. The CL code only include 1 work group and 1 work item, and I also made an C reference code on CPU doing the same thing. I assume the performance should be similar, but what I found is when using reference code on CPU and select the platform to be X64 , the reference code result is much faster( about 50%) than OpenCL on CPU. And if I select the reference code to run on win32 platform, the result is some similar with OpenCL on CPU.

        My question is does the AMD APP OpenCL compiler optimized for the X64 platform or only for W32 platform? And how can I verify it. Can anyone give me some instructions or suggestions?

      I used AMD kernel analyzer to analysis my CL code and generate the X86 assembly instructions as below.

      // only paste the vector add part of the code.

      // the CL source code is vec[c] = vec[a] + vec

      // where vec is char16 data type

       movdqa (%ebx), %xmm0
       addl $16, %ebx
       paddb (%esi), %xmm0
       addl $16, %esi
       movdqa %xmm0, (%ebp)
       addl $16, %ebp
       decl %edx
       jne LBB0_5
       jmp LBB0_3