6 Replies Latest reply on Dec 30, 2008 2:59 AM by gaurav.garg

    Some questions on CAL optimisation

    jean-claude

      Hi guys,

      Hope you had a very nice Xmas day.

      *wine*


      Here is a batch of some basic questions thatcall for clarification, thanks for your hints

      Jean-Claude


      Performance and sync consideration for CAL kernels
      (all the following assume operating on an unique context)

      (1) On the use of command queue flush :
      ------------------------------------------------------
      Assume having several kernels to be executed one after the other
      what is the tradeoff between:
      - issuing a flush after each each execution call in command queue
      - or issuing a flush after having issued all execution calls ?


      (2) On the use of several sequential kernels :
      ------------------------------------------------------------
      for kernels to be executed sequentially, what is the tradeoff between:
      - calling each kernel separately
      - or merging them into an unique kernel ?


      (3) On the use of multi-output kernels :
      -----------------------------------------------------
      What is the most efficient in term of performances

      kernel K_1 (out float4 C<>, float4 A<>, float4 B<>;
      kernel K_2 (out float4 D<>, float4 A<>, float4 B<>;

      or

      kernel K_3 (out float4 C<>, out float4 D<>, float4 A<>, float4 B<>;


      (4) On kernels execution :
      -----------------------------------
      Is it safe to assume that:
      (1) the order of execution of the kernels will be the same as the order of
      execution calls in command queue ?

      (2) is it correct to assume that no kernels are run concurrently
      kernel K_1 (out float4 C<>, float4 A<>, float4 B<>;
      kernel K_2 (out float4 D<>, float4 A<>, float4 B<>;
      kernel K_3 (out float4 E<>, float4 C<>, float4 B<>;

      ie for instance K_2 and K_1, K_2 and K_3 can run concurrently while
      K_1 and K_3 are expected to run sequentially ???


      (5) On the binding of kernel I/Os ie input, output, constant :
      --------------------------------------------------------------------------------
      BTW. What's the cost of calctxsetmem(ctx,inname,inputmem)

      Is it safe to assume that IOs binding declarations for a kernel
      are kept alive in the context or should they have to be issued
      each time the kernel is to be executed ?


      (6) On calMemCopy :
      -----------------------------
      CALresult calMemCopy(CALevent* event, CALcontext ctx,
      CALmem srcMem, CALmem dstMem, CALuint flags);

      What are the options for parameter flags??

        • Some questions on CAL optimisation
          jean-claude

          Additional question:

          (7) Concurent work in memory while a kernel is running

          Is it possible for the CPU to issue a calresMap and work on a memory resource C while a kernel is active on different memory resources (say A and B) and  calCtxEventDone is still CAL_RESULT_PENDING.

          Again here the kernel operates on resources different from C.

          Thanks

            • Some questions on CAL optimisation
              gaurav.garg

              Hi Jean-Claude,


              1. Command queue flush works on a specific context. All the commands associated to a context are kept in a queue to avoid CPU-GPU transfer overhead each time a new command is invoked. Though, I am not sure with CAL how effective this technique is.


              2. Performance gain on GPUs depends on memory/ALU ratio. Usually merging multiple kernels into a single kernel should definitely improve ALU usage as well as it should reduce memory fetches and increase memory reuse.


              3. It has same answer as 2. With K_3, you would be able to reduce your memory fetch and increase ALU operations.


              4. Order of execution will be same as calling order. GPUs can't run multiple kernels concurrently.


              5. They are kept alive. No need to bind them again.


              6. Currently, not used. Use 0.


              7. I haven't tested it, but it should be possible.


              Hope you got answers for some questions.