10 Replies Latest reply on Aug 23, 2011 10:05 AM by akhal

    OpenCL Performance on CPUs

    akhal

      Hello

      I have implemented few normal looping applications in OpenMP, TBB and OpenCL. In all these applications, OpeCL gives far better performance than others too when I am only running it on CPU with no specific optimizations done in kernels. OpenMP and TBB gives good performance too but far less than OpenCL, what could be reason for it because these both are CPU specialized frameworks and should gives at least a performance equal to OpenMP/TBB or should be less than them as it is more GPU oriented.

      My second concern is that when it comes to OpenMP and TBB, OpenMP is always better in performance than TBB in my implementations in which I havent tuned it for a very good optimizations as I am not so expert. Is there a reason that OpenMP is normally better in performance than TBB? Because I think they both or even OpenCL too uses same kind of thread pooling at low level.... Any expert opinions? Thanks

        • OpenCL Performance on CPUs
          rick.weber

          OpenCL has optimizations turned on by default. To disable them, you have to pass -cl-no-optimizations to the compiler (or something to that effect). If you don't have optimizations turned on in your TBB and OpenMP tests, then you're comparing optimized OpenCL code to unoptimized OpenMP code. That would account for the discrepancy.

            • OpenCL Performance on CPUs
              nareshsankapelly

              Hi Akhal,

              I agree with Rick.Weber as far as OpenCL is concerned. The perfomance of TBB and(or) OpenMP purely depends on your implementation. You can't generalize that OpenMP always out performs TBB. Maybe loadbalancing overhead in TBB is one of the reasons for this. Selecting the proper chunk size will also affect the performance.   

              It would be easy for anyone to analyze if you could post your code snippets for both the cases.  

                • OpenCL Performance on CPUs
                  akhal

                  Thanks Mr nareshsankapelly

                  Yea you are right about chunksizes optimizations in case of OpenMP or TBB, but actually my codes dont specify any chunksizes in both cases; it uses "schedule(static)" in OpenMP and leaves chunksize estimation on compiler in TBB too by not specifying any size and use "auto-patitioner". In this case my all implementations of OpenMP outperform TBB, does that mean that the runtime scheduler of OpenMP is better?

                    • OpenCL Performance on CPUs
                      nareshsankapelly

                      You have to use "-cl-no-optimizations" flag in clBuildProgram function. 

                        • OpenCL Performance on CPUs
                          akhal

                          I searched OpenCL specifications for clBuildProgram and its 4th argument is for optimizations which takes "-cl-opt-disable" flag to turn off all optimizations, but When I use this flag, I get "undefined -cl-opt-disable" error, doest this means my OpenCL SDK doesnt support this yet ? I am using the latest AMD SDK...

                            • OpenCL Performance on CPUs
                              genaganna

                               

                              Originally posted by: akhal I searched OpenCL specifications for clBuildProgram and its 4th argument is for optimizations which takes "-cl-opt-disable" flag to turn off all optimizations, but When I use this flag, I get "undefined -cl-opt-disable" error, doest this means my OpenCL SDK doesnt support this yet ? I am using the latest AMD SDK...

                               

                              I am able to use this flag without any problem.  Could you please us following information

                              OS, CPU, GPU, SDK version and Driver version.

                              • OpenCL Performance on CPUs
                                nareshsankapelly

                                 

                                Originally posted by: akhal I searched OpenCL specifications for clBuildProgram and its 4th argument is for optimizations which takes "-cl-opt-disable" flag to turn off all optimizations, but When I use this flag, I get "undefined -cl-opt-disable" error, doest this means my OpenCL SDK doesnt support this yet ? I am using the latest AMD SDK...

                                 

                                AFIK, It should work with latest SDK. I tried to use the same at my end. It is working fine.  

                                 

                            • OpenCL Performance on CPUs
                              nareshsankapelly

                               

                              Originally posted by: akhal Thanks Mr nareshsankapelly

                               

                              Yea you are right about chunksizes optimizations in case of OpenMP or TBB, but actually my codes dont specify any chunksizes in both cases; it uses "schedule(static)" in OpenMP and leaves chunksize estimation on compiler in TBB too by not specifying any size and use "auto-patitioner". In this case my all implementations of OpenMP outperform TBB, does that mean that the runtime scheduler of OpenMP is better?

                               

                              AFAIK, schedule(static) assigns chunks statically to threads. But, TBB does load balancing with auto_partitioner also. 

                               

                          • OpenCL Performance on CPUs
                            akhal

                            Thanks for the hints but I actually compiled all my applications with intel compiler and I passed -O0 option to turn off automatic optimizations by the compiler as "icc -O0 -g ....."  and I thought thats enough to stop compiler from optimizations by itself; if thats not enough, how do I use "-cl-no-optimizations" while compiling my code?