I am trying to benchmark clamdfft on AMD radeon 7470M. Kindly comment on the method of benchmarking.
I call the FFT kernel like this, with event as cl_event.
err = clAmdFftEnqueueTransform(planHandle, CLFFT_BACKWARD, 1, &queue, 0, NULL, &event,
buffersIn, buffersOut, tmpBuffer);
With profiling enabled I get the time elapsed like this.
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL);
total_time = time_end - time_start;
printf("%0.6f \t", (total_time / 1000000.0));
Is this the right way to benchmark clfft? My results are as follows,
(time in milli seconds for FFT length in powers of 2 - starting from 2 power 1)