I need to do some granular profiling of huge kernels. Nvidia provides an access to its clock register via inline assembler:
uint clock_time()
{
uint clock_time;
asm("mov.u32 %0, %%clock;" : "=r"(clock_time));
return clock_time;
}
Is there a similar possibility to get the current time/clock within the kernel on AMD/ATI GPUs?