3 Replies Latest reply on Sep 26, 2012 12:19 PM by realhet

    Is compute + data movement overlap possible with clEnqueueRead/Write() calls?

    thejascr

      I have a AMD Firepro v4800 Discrete graphics card and using AMD 2.6 SDK on a linux box running 3.0.0-16-generic linux kernel.

      I want to overlap some kernel execution with some data transfer to the GPU.

      I see that the SDK samples all use clEnqueueMapBuffer() in the TransferOverlap example.

      I tried to do the same with clEnqueueRead/Write with out-of-order queue. But this is not working.

      So is it even possible to do this with clEnqueueRead/WriteBuffer() ?

       

      [ I couldnt find anything in the opencl 1.1 spec that

      says it should not. I am using CL_FALSE option and synchronized the data movement with events. When I profile and see

      the order of command execution I dont see any out-of-orderness even though I see the ENQUEUED event from profiling showing

      time before the kernel completion time.]

       

      Thanks

      -Thejas

        • Re: Is compute + data movement overlap possible with clEnqueueRead/Write() calls?
          realhet

          Hi,

          Create multiple contexts on the same device, those are overlap well (at least on HD 5..,6..,7970).

          Some months ago I've tried that EXEC_OUT_OF_ORDER flag, but it did nothing, everything in a single context was sequential.

            • Re: Is compute + data movement overlap possible with clEnqueueRead/Write() calls?
              thejascr

              Hi,

              Thanks!.

              I tried the solution you suggested but its still not working.

              Now I have compute_context and data_context. My clCreateBuffers() are called in data_context

              With two contexts, I am seeing a strange problem. If I register for a event call back with clEnqueueRead/Write() (in data context) then the clSetEventCallBack() is no longer getting called back.

              The events that I register for callback as part of clEnqueueNDRangeKernel() (compute_context) in the compute_contexts are getting called back.

              Any idea why the event callback is not getting called back from data context?

               

              -Thejas

                • Re: Is compute + data movement overlap possible with clEnqueueRead/Write() calls?
                  realhet

                  Hi,

                  I guess you try to share data between kernels and OpenCL synchronizes.

                   

                  My suggestion worked in the following scenario:

                  There was a kernel which ran for 0.5 seconds, and it has a small amount of upload at the start and download at the beginning (few megabytes only).

                  So I made 2 kernels on 2 contexts and the data also was allocated no the 2 contexts (no interference). (it needs twice the paralellism unfortunately)

                   

                  I had a 20millisec timer function that was polled the two kernels and controlled them: When one of the kernels was about to finish it launched the other one (on the other context).

                   

                  This way with 2 totally independent contexts I've got +2-3% speedup, and got 99% ALU utilization.

                   

                  If your kernel using RAM more frequently, then it's a lot harder situation