5 Replies Latest reply on Dec 17, 2011 4:38 PM by thesmileman

    how to Parallelize recursive code

    meetajinkya88
      Parallelize recursive code

      I have one serial code having function which calls itself and other function recursively.I want to parallelize it in task parallel manner.so can i parallelize this recursive code on CPU device type, if not then plz suggest the alternative for the same.

        • how to Parallelize recursive code
          antzrhere

          you cannot generate new threads within an OpenCL kernel itself, so to truly parallise your code you would have to come up with a completely different approach to solving your problem. OpenCL does not support recursion either. Basically any fuction you call within a kernel, virtual or non-recursive runs on the same hw thread and will not take advantage of parallel resources.

          If your just want to perform recursive functions (in a serial manner) (which OpenCL does NOT allow) the only way round this would be to emulate a function call. Basically have a loop within the kernel and a stack that holds function arguments - when you want to call the function write all arguments to the private stack, push (increment stack index) and loop to beginning of function using a continue statement. When your stack index is zero you can exit the loop.

           

            • how to Parallelize recursive code
              thesmileman

               

              Originally posted by: antzrhere you cannot generate new threads within an OpenCL kernel itself, so to truly parallise your code you would have to come up with a completely different approach to solving your problem.


              You aren't looking at this from the right angle. Sure you can't launch new threads directly in OpenCL kernels but the kernel is only one piece of OpenCL. The OpenCL api supports event callbacks so you can easily create a recursive C function that executes an OpenCL Kernel and has itself as the return event. Image processing roteens do this all the time. In fact I have a framework I am developing that you could use to do exactly this.

                • how to Parallelize recursive code
                  antzrhere

                  To be fair, we cannot be sure how to look at it without some code or info on the problem.

                  I agree with you, some problems can be solved like this (this may be one of them), but the problem may end up being slower on the GPU. Your solution is certainly an option, but it's worth keeping in mind that on AMD GPUs there is a significant overhead associated with multiple kernel calls. In your case this is fine, but we don't know how many times recursion will be invoked with this problem . If its frequent, most time will be spent on kernel overhead instead of number crunching, hence why keeping the same kernel running may be an suitable alternative.

                  The real question is, What is the problem and will it actually benefit from being ported to OpenCL without a redesign?

                  • how to Parallelize recursive code
                    meetajinkya88

                    Sir,

                    Thanx for the suggestion.Do you have nay sample code for the same you are talking about,because I am doing my work on image processing only.

                     

                      • how to Parallelize recursive code
                        thesmileman

                         

                        Originally posted by: meetajinkya88 Sir,

                         

                        Thanx for the suggestion.Do you have nay sample code for the same you are talking about,because I am doing my work on image processing only.

                         

                         

                        If you program is doing a divide and conquer sort of problem where you input and output will change you will have a lot more trouble but you can still do it but it will be different. Someone has a recurssive gausion blur example (I don't remember if it is nvidia or amd) . I have seen the source but I have seen the demo so you could look at that.

                        It really isn't any different than anything any other recursive function but here is a guess as to how you might want to implement it.

                        You should be able to get any code using opencl for image processing from lots of tutorials. There are also three good opencl books which have sample code showing how to use events. One is OpenCL in Action.

                        I am only going to describe the c code because the kernel should be self explanitory or go look at tutorials or books because they will explain OpenCL you better than me in a forum post.

                        1.)First copy your inputs to the gpu in your main function.

                        2.) call your recursive c function using the on device references.

                        Note the next part is inside your c function

                        2.1) set your kernel arguments to the references

                        2.2) adjust sizes for workgroups etc if the input size of data has changed

                        2.3) call enqueueNdRangeKernel

                        2.3) wait for kernel to complete (you could use events like I mentioned but if you are new to openCL just do it this way)

                        2.4) check if you are done if so copy the data back from the gpu (if you need to group returned data do that here) and return

                        2.4) if you aren't done call yourself again using the references.