2 Replies Latest reply on Jun 27, 2010 10:54 AM by Illusio

    Good reference for non-blocking algorithms



      From what I understand locks/mutexes are not available for OpenCL. That necessiates converting solutions into lock-free/non-blocking form. Is there a general reference or book that is gives to up-to-date approaches, preferably with good examples, to achieve the same or using stream computing? I am able to find only papers and that too discussing mostly stack/queue kind of implementations.



        • Good reference for non-blocking algorithms

          No suggested references?

            • Good reference for non-blocking algorithms


              Originally posted by: sine No suggested references?


              Your request might be a bit general. I interpreted your opening post along the lines of looking for the moral equivalent of:

              "Accounting - a non-blocking approach using GPUs"

              for whatever area you plan to develop algorithms in(Because you dismiss papers about basic data structures as uninteresting). However, I think that may be asking a bit much, except possibly in certain areas of computer graphics. If you listed a few concrete problems that need solving maybe it would be easier for people to offer suggestions?

              I suppose you're aware that there are atomic function extensions to OpenCL? Also, if what you're looking for is general advice for exploiting parallelism on all levels I suppose there must be some books about SIMD programming which would transfer directly to OpenCL. I'm a bit confused about the fact that you appear to be looking for some "general method" here though. The only general principles I can think of would be that parallelization tends to require refactoring data structures to allow the use of vector instructions and simple identification of work units. Functional design and single work queue per processing element can also be really good ideas for removing or minimizing synchronization needs.

              In general, Amdahl's law tends to brutally murder dreams of massive speedups of problems that inherently require large amounts of complex synchronization, so it's a pretty serious warning sign that this appears to be your first area of interest.

              Anyway, if your problem needs synchronization, but has an existing parallel implemetation with large work chunks, you could consider keeping your synchronization hostside and just use OpenCL to accelerate the computational core of your application.