1 Reply Latest reply on Aug 31, 2011 6:58 PM by rick.weber

    OpenCL performance vs OpenMP



      Have there been any studies done camparing OpenCL performance to OpenMP?  Specifically I am interested in the overhead cost of launching threads with OpenCL, e.g., if one were to decompose the domain into a very large number of individual work items each run by a thread doing a small job versus heavier weight threads in OpenMP were the domain was decomposed into sub domains whose number equals the number of cores.
      It seems that the OpenCL programming model is more targeted towards massively parallel chips, GPUs for instance rather than CPUs that have fewer but more powerful cores.  
      Can OpenCL be an effective replacement for OpenMP?

        • OpenCL performance vs OpenMP

          It's generally better to think of work items as loop iterations than actual threads. On a GPU, they in fact to map to threads, but on the CPU a task pool just pulls work items. From the applications I've developed, OpenCL running on a CPU seems to have more than acceptable performance and I've seen results competetive with OpenMP.

          The drawback with OpenCL is that you have a more restrictive memory and programming model. OpenMP allows you to do fairly arbitrary things in comparison and works in a variety different languages. For example, recursion isn't allowed in OpenCL and true functions may or may not exist on the target device, whereas with OpenMP, you have a much looser programming model that lets you do anything you can do sequentially with a few parallel constructs.

          All in all, I think if you're coding an application from the ground up, OpenCL is a good way to go, but if you need to apply parallelism to existing applications, OpenMP can oftentimes yield a solution as simple as adding a #pragma parallel for.