10 Replies Latest reply on Dec 2, 2010 5:45 AM by ravikeshri

    OpenCL Profiler End times

    perhaad
      Kernel calls without flush and finish between them lead to same end time

      Hello All,

      For the OpenCL profiliing calls to clGetEventProfilingInfo()  the end times of multiple kernels enqueud after each other seems to be the same.

      Basically for multiple kernels enqueued on a device

      <--K0--><--K1--><--K2-->

      K0-K2 all report the same end time (I record a list of events and query at the end)

       

      I remember that this was discussed in the forums some time back but I couldn't find the thread, to check for an update I wanted to check if this still exists even with the newer drivers like 10.10 and with the OpenCL 2.2 SDK implementation.

      I still see this issue on Linux 64 bit Ubuntu and 10.10 Catalyst with 2.2 SDK

        • OpenCL Profiler End times
          d.a.a.

          It seems that this isn't wrong. If I understand correctly, without explicit call to clFinish (or explicit event dependency) the implementation is free to start and finish kernels when it thinks is more appropriate.

            • OpenCL Profiler End times
              perhaad

               

              Originally posted by: d.a.a. It seems that this isn't wrong. If I understand correctly, without explicit call to clFinish (or explicit event dependency) the implementation is free to start and finish kernels when it thinks is more appropriate.

               

              I dont know if that is the case. The implementation should only be able to do that if the  CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE flag is provided, otherwise the queue is supposed to execute in order.

              Either way, multiple kernel executions shouldn't have the same end time because that would *appear* like we are executing multiple kernels at a time which isn't the case

               

                • OpenCL Profiler End times
                  nou

                  yes i read specification carefully and without out-of-order flag implies in order execution. and every command in queue is implicit synchronized with previous command that it do not execute before previous end.

                    • OpenCL Profiler End times
                      perhaad

                      I understand about the commands on a queue occurring serially. But shouldnt the end times of each command be distinct from each other ?

                       

                        • OpenCL Profiler End times
                          himanshu.gautam

                          I checked the situation with internal SDK by enqueueing 25 clEnqueueWriteBuffer commands. They all execute sequentially and no two commands end at the same time.

                          It would be nice if you share your code.

                            • OpenCL Profiler End times
                              ravikeshri

                              perhaad, We have experienced the same scenario. The reason is explained in the ATI_Stream_SDK_OpenCL_Programming_Guide section 4.3.1. It says that "To reduce the launch overhead, the AMD OpenCL runtime combines several command submissions into a batch. Commands submitted as batch report similar start times and the same end time."  Your commands are run as batch so the end times are same.

                              I don't understand why the start time is different though. My trials also showed a few commands enqueued later started before other commnds but they all ended at the same time.

                                • OpenCL Profiler End times
                                  ravikeshri

                                  @Himanshu, Did you use the SDK 2.2? We could observe this scenario in SDK 2.2.

                                    • OpenCL Profiler End times
                                      perhaad

                                      @Himanshu I see it with EnqueNDRange Kernels, I dont do any device IO / synchronization in between. I dont know if that matters ?

                                      I have a pretty bulky app that depends on opencv. I will try and put together a simpler test case 

                                      @Ravikeshri,  Even with the batching I would have thought that the counters might output separate values for OpenCL which would be for timing a kernel. I dont know if information about each Enqueue or its individual timing is lost once the *batching* is done. I am not super-familiar with things at the CAL-level

                                      On a related thing, it is fine on Windows. 

                                      For linux I do a hacky work around: Enqueue a simple do nothing kernel at the end of each real kernel, so end time of real kernel  ~ start of dummy kernel

                                        • OpenCL Profiler End times
                                          himanshu.gautam

                                          ravikeshri,

                                          my apologies. the clenqueuewrirebuffer was calling a clfinish after every time.I modified my code and checked with some simple kernel. This time i get very similar end times for all the kernels as expected according to openCL programming guide. 

                                           I think you can write a test querying the profiling info of each ndrange call after enqueueing a large number of such calls and deduce the batch size after analyzing the profiling info of each kernel.

                                          perhaad,

                                          I think to query the correct profiling of individual kernels we can run them individually by calling a clFinish after every kernel.After performance optimizations you can club them together by removing the clFinish().