9 Replies Latest reply on Mar 1, 2010 12:14 PM by ryta1203

    Time measuring for OpenCL kernels

    Hill_Groove
      clock_t problem

      Hello everyone,

      I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.

      Is there any method to avoid this trouble and get the correct time? Thank You.

      i7-860, 5870

      clock_t start = clock(); .... gpu.SetKernelArgs(); gpu.RunKernel(); gpu.ReadKernelArg(6, RES, K*K*4); .... float Gworktime = (float) (clock() - start) / CLOCKS_PER_SEC;

        • Time measuring for OpenCL kernels
          Fr4nz

           

          Originally posted by: Hill_Groove Hello everyone,

           

          I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.

           

          Is there any method to avoid this trouble and get the correct time? Thank You.

           

          i7-860, 5870

           

           

          Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

          The solution is to use OpenCL profiling thru events, see here:

          http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html

          and make a search in this forum

            • Time measuring for OpenCL kernels
              Hill_Groove

              Fr4nz,

              thank you.

              • Time measuring for OpenCL kernels
                ryta1203

                 

                Originally posted by: Fr4nz
                Originally posted by: Hill_Groove Hello everyone,

                 

                I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.

                 

                Is there any method to avoid this trouble and get the correct time? Thank You.

                 

                i7-860, 5870

                 

                 

                Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

                The solution is to use OpenCL profiling thru events, see here:

                http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html

                and make a search in this forum

                Is this correct?

                The fact that the CPU might not be doing some direct work at that time, I'm not sure means that it isn't counting clock ticks!?

                I haven't tried to time only a kernel in this way, but it seems to me it should work fine.

                  • Time measuring for OpenCL kernels
                    gapon

                     

                    Originally posted by: ryta1203
                    Originally posted by: Fr4nz
                    Originally posted by: Hill_Groove Hello everyone,

                     

                     

                     

                    I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.

                     

                     

                     

                    Is there any method to avoid this trouble and get the correct time? Thank You.

                     

                     

                     

                    i7-860, 5870

                     

                     

                     

                     

                     

                     

                    Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

                     

                    The solution is to use OpenCL profiling thru events, see here:

                     

                    http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html

                     

                    and make a search in this forum

                     

                     

                    Is this correct?

                     

                    The fact that the CPU might not be doing some direct work at that time, I'm not sure means that it isn't counting clock ticks!?

                     

                    I haven't tried to time only a kernel in this way, but it seems to me it should work fine.

                     

                    Fr4nz was right, the clock() function returns an approximation of processor time used by the program. See details in:

                    http://linux.die.net/man/3/clock

                    To measure wall clock time intervals (on UNIX) one can use the following POSIX functions:

                     

                      #include <time.h>

                       ..

                     

                       struct timespec begin;

                       clock_gettime( CLOCK_REALTIME, &begin );

                       ...

                       struct timespec end;

                       clock_gettime( CLOCK_REALTIME, &end );

                       cout << "BEGIN: " << begin.tv_sec << "." << begin.tv_nsec << "\n"
                               << "END:    " << end.tv_sec << "." << end.tv_nsec << endl;


                    Also, be careful about a resolution of this method! I think, on some systems, it can be as bad as 0.01s.  Run the same kernel a sufficient number of times to get reliable measurements.

                     

                     

                     

                      • Time measuring for OpenCL kernels
                        ryta1203

                        There is a slightly different definition here:

                        http://www.cplusplus.com/reference/clibrary/ctime/clock/

                        "Returns the number of clock ticks elapsed since the program was launched."

                        This implies elapsed ticks, not ticks used?

                        I'm not arguing because I disagree, I'm simply confused and trying to get a more accurate answer.

                        I only say this because if I use the timing above I don't get ZERO for the timing of the kernels. For example, if I run the kernel 1000 times and then 10000 times, I get totally different results. I'm fairly certain that the timing difference is not attributable to the CPU loop alone.

                          • Time measuring for OpenCL kernels
                            genaganna

                            clock is not suitable to measure smaller execution time.

                            To measure smaller times, do as follows

                             

                             

                            clock_t start = clock() * CLOCKS_PER_SEC; .... gpu.SetKernelArgs(); gpu.RunKernel(); gpu.ReadKernelArg(6, RES, K*K*4); .... float Gworktime = (float) (clock() * CLOCKS_PER_SEC - start) / (CLOCKS_PER_SEC * CLOCKS_PER_SEC);

                              • Time measuring for OpenCL kernels
                                ryta1203

                                genaganna,

                                 Yes, but this is not the argument. The argument is that Franz and gapon are claiming that while the GPU is running, the "clock()" function is not incrementing, which I believe to be incorrect. This might be true if you could run the entire program strictly on the GPU, since the clock is based on the CPU, but since the program is executed on the CPU, I believe (and from the few tests I've done with clock() since this thread started) that "clock()" is incrementing, even during kernel calls, since the program is still being excecuted on the CPU (not the kernel, but the main program).

                                  Also, you can always just increase the iterations of the kernel call so that you can more accurately determine gain/loss.

                        • Time measuring for OpenCL kernels
                          n0thing

                           

                          Originally posted by: Fr4nz

                          Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

                           

                             



                          This is incorrect as clFinish or clWaitforEvents(for the particular kernel) does wait for the kernel to finish execution hence CPU clocks elapsed represents the kernel execution time.

                          All the SDK samples use CPU timers to measure the kernel time which also includes the device<->host transfer time. You can run the samples for many iterations where it displays the avg time of the kernel.

                            • Time measuring for OpenCL kernels
                              ryta1203

                               

                              Originally posted by: n0thing
                              Originally posted by: Fr4nz

                              Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

                               

                                 



                              This is incorrect as clFinish or clWaitforEvents(for the particular kernel) does wait for the kernel to finish execution hence CPU clocks elapsed represents the kernel execution time.

                              All the SDK samples use CPU timers to measure the kernel time which also includes the device<->host transfer time. You can run the samples for many iterations where it displays the avg time of the kernel.

                              Thank you, as I expected.