I see a strange behavior on my GPU. I got
- AMD Fusion Processor (E350 + integrated HD 6310)
- ubuntu linux
- OpenCL 1.2 AMD-APP (923.1)
When I run a kernel on the GPU, the kernel execution time (taken from clGetEventProfilingInfo) is allways half of the execution time, that I get with gettimeofday on the host. It can't be a setup time because it is always half of the time measured on the host. The rest of the time the GPU seems idle.
I transfer data before measuring (with a blocking clEnqueueWriteBuffer). So data transfer shouldn't be the reason. I tried several kernels, having the same strange behaviors. When I run the kernel on the CPU, profiling time is equal to the gettimeofday time (no problem there). I tried clFinish, clWaitForEvents and waiting and checking on the host with clGetEventInfo. All with the same result.
I'll attach a screenshot of my resent tests with CodeXL. The kernel is called "nest". Very interesting is, that the idle time is always before the actual execution. It's always the same time like the execution. That really strange.
I hope someone knows a solution or a workaround.
try enqueue two kernels invocations. even it is not shown there is data transfer and/or allocation before first kernel execution.
I tried that, with some strange results.
See attached a screenshot when I execute the same kernel twice before calling clWaitForEvents and a screenshot where I call a kernel, call clWaitForEvents, call the kernel again and wait again.
Very strange is the first case, where the idle time before execution is exactly as long as both kernels execute afterwards.
now I remebered that there is bug in CodeXL. it show for example that kernels from two threads are executed overlaped even in reality not.
But I don't have 2 threads on the host. I call the two kernels one after an other.
And I don't think it's a bug in CodeXL (at least not only there) because I see that same behavior with the times I get from clGetEventProfilingInfo.