cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Hill_Groove
Journeyman III

Time measuring for OpenCL kernels

clock_t problem

Hello everyone,

I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.

Is there any method to avoid this trouble and get the correct time? Thank You.

i7-860, 5870

clock_t start = clock(); .... gpu.SetKernelArgs(); gpu.RunKernel(); gpu.ReadKernelArg(6, RES, K*K*4); .... float Gworktime = (float) (clock() - start) / CLOCKS_PER_SEC;

0 Likes
9 Replies
Fr4nz
Journeyman III

Originally posted by: Hill_Groove Hello everyone,

 

I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.

 

Is there any method to avoid this trouble and get the correct time? Thank You.

 

i7-860, 5870

 

 

Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

The solution is to use OpenCL profiling thru events, see here:

http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html

and make a search in this forum

0 Likes

Fr4nz,

thank you.

0 Likes

Originally posted by: Fr4nz
Originally posted by: Hill_Groove Hello everyone,

 

I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.

 

Is there any method to avoid this trouble and get the correct time? Thank You.

 

i7-860, 5870

 

 

Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

The solution is to use OpenCL profiling thru events, see here:

http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html

and make a search in this forum

Is this correct?

The fact that the CPU might not be doing some direct work at that time, I'm not sure means that it isn't counting clock ticks!?

I haven't tried to time only a kernel in this way, but it seems to me it should work fine.

0 Likes

Originally posted by: ryta1203
Originally posted by: Fr4nz
Originally posted by: Hill_Groove Hello everyone,

 

 

 

I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.

 

 

 

Is there any method to avoid this trouble and get the correct time? Thank You.

 

 

 

i7-860, 5870

 

 

 

 

 

 

Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

 

The solution is to use OpenCL profiling thru events, see here:

 

http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html

 

and make a search in this forum

 

 

Is this correct?

 

The fact that the CPU might not be doing some direct work at that time, I'm not sure means that it isn't counting clock ticks!?

 

I haven't tried to time only a kernel in this way, but it seems to me it should work fine.

 

Fr4nz was right, the clock() function returns an approximation of processor time used by the program. See details in:

http://linux.die.net/man/3/clock

To measure wall clock time intervals (on UNIX) one can use the following POSIX functions:

 

  #include <time.h>

   ..

   struct timespec begin;

   clock_gettime( CLOCK_REALTIME, &begin );

   ...

   struct timespec end;

   clock_gettime( CLOCK_REALTIME, &end );

   cout << "BEGIN: " << begin.tv_sec << "." << begin.tv_nsec << "\n"
           << "END:    " << end.tv_sec << "." << end.tv_nsec << endl;


Also, be careful about a resolution of this method! I think, on some systems, it can be as bad as 0.01s.  Run the same kernel a sufficient number of times to get reliable measurements.

 

 

 

0 Likes

There is a slightly different definition here:

http://www.cplusplus.com/reference/clibrary/ctime/clock/

"Returns the number of clock ticks elapsed since the program was launched."

This implies elapsed ticks, not ticks used?

I'm not arguing because I disagree, I'm simply confused and trying to get a more accurate answer.

I only say this because if I use the timing above I don't get ZERO for the timing of the kernels. For example, if I run the kernel 1000 times and then 10000 times, I get totally different results. I'm fairly certain that the timing difference is not attributable to the CPU loop alone.

0 Likes

clock is not suitable to measure smaller execution time.

To measure smaller times, do as follows

 

 

clock_t start = clock() * CLOCKS_PER_SEC; .... gpu.SetKernelArgs(); gpu.RunKernel(); gpu.ReadKernelArg(6, RES, K*K*4); .... float Gworktime = (float) (clock() * CLOCKS_PER_SEC - start) / (CLOCKS_PER_SEC * CLOCKS_PER_SEC);

0 Likes

genaganna,

 Yes, but this is not the argument. The argument is that Franz and gapon are claiming that while the GPU is running, the "clock()" function is not incrementing, which I believe to be incorrect. This might be true if you could run the entire program strictly on the GPU, since the clock is based on the CPU, but since the program is executed on the CPU, I believe (and from the few tests I've done with clock() since this thread started) that "clock()" is incrementing, even during kernel calls, since the program is still being excecuted on the CPU (not the kernel, but the main program).

  Also, you can always just increase the iterations of the kernel call so that you can more accurately determine gain/loss.

0 Likes

Originally posted by: Fr4nz

Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

 

   



This is incorrect as clFinish or clWaitforEvents(for the particular kernel) does wait for the kernel to finish execution hence CPU clocks elapsed represents the kernel execution time.

All the SDK samples use CPU timers to measure the kernel time which also includes the device<->host transfer time. You can run the samples for many iterations where it displays the avg time of the kernel.

0 Likes

Originally posted by: n0thing
Originally posted by: Fr4nz

Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.

 

   



This is incorrect as clFinish or clWaitforEvents(for the particular kernel) does wait for the kernel to finish execution hence CPU clocks elapsed represents the kernel execution time.

All the SDK samples use CPU timers to measure the kernel time which also includes the device<->host transfer time. You can run the samples for many iterations where it displays the avg time of the kernel.

Thank you, as I expected.

0 Likes