6 Replies Latest reply on Dec 5, 2010 6:21 PM by nou

    clFlush and clFinish

      device level parallel


        Hi all,

         Now I have two GPUs, I assign each with a context and a command queue.

      I want them to run in parallel, meaning device level parallel.  Will  





      give  me parallel execution between the two gpus? Or gpu1 will start after gpu0 has finished?


      Thanks in advance.

        • clFlush and clFinish


          If you use clFlush command the commands in that commandqueue are forced to start then and there and the control returns to main program.

          So i think clFlush command is enough and you do not need clFinish command to run the program in parallel.

          Any how i think you are using clFinish as a barrier to make sure the commandqueues are executed before we move forward. But this will inhibit us to use the CPU for that time.

            • clFlush and clFinish

               Hi Himanshu:

                  Thanks, I'll use that to see if I can speed up my program with two GPUs.


                • clFlush and clFinish

                  Edit: incomplete double post

                  • clFlush and clFinish

                    From the ATI Stream SDK OpenCL Programming Guide (rev. 1.05), page 4-44:

                     The AMD OpenCL implementation spawns a new thread to manage each
                    command queue. Thus, the OpenCL host code is free to manage multiple
                    devices from a single host thread. Note that clFinish is a blocking operation;
                    the thread that calls clFinish blocks until all commands in the specified
                    command-queue have been processed and completed. If the host thread is
                    managing multiple devices, it is important to call clFlush for each command-
                    queue before calling clFinish, so that the commands are flushed and execute in
                    parallel on the devices. Otherwise, the first call to clFinish blocks, the
                    commands on the other devices are not flushed, and the devices appear to
                    execute serially rather than in parallel.

                    However the standard is kind of unclear on whether this is necessarily going to be the behaviour.  It just states that issued commands are guaranteed to be issued to the device. It does not guarantee that clFlush will not block (like clFinish does).

                    Also the standard states that commands like clEnqueueWriteBuffer and similar functions will issue a clFlush if the blocking parameter is true. However it seems to me that what they really do is issue clFinish, since they actually block untill the command is completed, not just untill it's issued to the device. This seems a bit inconsistent to me.

                    Also my experience with clFlush on a previous SDK was that it actually took just as long to return as clFinish (i.e. clFlush seemed to be blocking). I haven't tried this on the current SDK though, so perhaps this behaviour has changed (or something funky was happening on my system).

                    If clFlush does work as expected for you please let us know