Hello everyone,
I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.
Is there any method to avoid this trouble and get the correct time? Thank You.
i7-860, 5870
clock_t start = clock(); .... gpu.SetKernelArgs(); gpu.RunKernel(); gpu.ReadKernelArg(6, RES, K*K*4); .... float Gworktime = (float) (clock() - start) / CLOCKS_PER_SEC;
Originally posted by: Hill_Groove Hello everyone,
I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.
Is there any method to avoid this trouble and get the correct time? Thank You.
i7-860, 5870
Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.
The solution is to use OpenCL profiling thru events, see here:
http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html
and make a search in this forum
Fr4nz,
thank you.
Originally posted by: Fr4nz Originally posted by: Hill_Groove Hello everyone,
I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.
Is there any method to avoid this trouble and get the correct time? Thank You.
i7-860, 5870
Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.
The solution is to use OpenCL profiling thru events, see here:
http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html
and make a search in this forum
Is this correct?
The fact that the CPU might not be doing some direct work at that time, I'm not sure means that it isn't counting clock ticks!?
I haven't tried to time only a kernel in this way, but it seems to me it should work fine.
Originally posted by: ryta1203 Originally posted by: Fr4nzOriginally posted by: Hill_Groove Hello everyone,
I am using a clock_t class to get my code worktime. The attached code always returns zero kernel count time on Ubuntu, although on Win7 everything seems to be in order.
Is there any method to avoid this trouble and get the correct time? Thank You.
i7-860, 5870
Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.
The solution is to use OpenCL profiling thru events, see here:
http://www.khronos.org/opencl/sdk/1.0/docs/man/xhtml/clGetEventProfilingInfo.html
and make a search in this forum
Is this correct?
The fact that the CPU might not be doing some direct work at that time, I'm not sure means that it isn't counting clock ticks!?
I haven't tried to time only a kernel in this way, but it seems to me it should work fine.
Fr4nz was right, the clock() function returns an approximation of processor time used by the program. See details in:
http://linux.die.net/man/3/clock
To measure wall clock time intervals (on UNIX) one can use the following POSIX functions:
#include <time.h>
..
struct timespec begin;
clock_gettime( CLOCK_REALTIME, &begin );
struct timespec end;
clock_gettime( CLOCK_REALTIME, &end );
Also, be careful about a resolution of this method! I think, on some systems, it can be as bad as 0.01s. Run the same kernel a sufficient number of times to get reliable measurements.
There is a slightly different definition here:
http://www.cplusplus.com/reference/clibrary/ctime/clock/
"Returns the number of clock ticks elapsed since the program was launched."
This implies elapsed ticks, not ticks used?
I'm not arguing because I disagree, I'm simply confused and trying to get a more accurate answer.
I only say this because if I use the timing above I don't get ZERO for the timing of the kernels. For example, if I run the kernel 1000 times and then 10000 times, I get totally different results. I'm fairly certain that the timing difference is not attributable to the CPU loop alone.
clock is not suitable to measure smaller execution time.
To measure smaller times, do as follows
clock_t start = clock() * CLOCKS_PER_SEC; .... gpu.SetKernelArgs(); gpu.RunKernel(); gpu.ReadKernelArg(6, RES, K*K*4); .... float Gworktime = (float) (clock() * CLOCKS_PER_SEC - start) / (CLOCKS_PER_SEC * CLOCKS_PER_SEC);
genaganna,
Yes, but this is not the argument. The argument is that Franz and gapon are claiming that while the GPU is running, the "clock()" function is not incrementing, which I believe to be incorrect. This might be true if you could run the entire program strictly on the GPU, since the clock is based on the CPU, but since the program is executed on the CPU, I believe (and from the few tests I've done with clock() since this thread started) that "clock()" is incrementing, even during kernel calls, since the program is still being excecuted on the CPU (not the kernel, but the main program).
Also, you can always just increase the iterations of the kernel call so that you can more accurately determine gain/loss.
Originally posted by: Fr4nz
Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.
This is incorrect as clFinish or clWaitforEvents(for the particular kernel) does wait for the kernel to finish execution hence CPU clocks elapsed represents the kernel execution time.
All the SDK samples use CPU timers to measure the kernel time which also includes the device<->host transfer time. You can run the samples for many iterations where it displays the avg time of the kernel.
Originally posted by: n0thing Originally posted by: Fr4nz
Assuming that kernels are executed on GPU, the fact is that, when you execute your kernels on GPU, no CPU clock is spent during the execution. That's why you get 0.
This is incorrect as clFinish or clWaitforEvents(for the particular kernel) does wait for the kernel to finish execution hence CPU clocks elapsed represents the kernel execution time.
All the SDK samples use CPU timers to measure the kernel time which also includes the device<->host transfer time. You can run the samples for many iterations where it displays the avg time of the kernel.
Thank you, as I expected.