Originally posted by: ryta1203 bpurnomo,
I also have a question as to why the timing of the kernels via the profiler varies so much?
For example, sometimes I get 14ms for a run and other times 18ms.
Currently, I am just taking the mean over 10 or so runs to get a more stable timing, but is this fluctuation normal?
After switching back to SDK 2.1, I'm not having this problem anymore.
Originally posted by: ryta1203 Just a guess but I believe the profiler times just the tranfers and the kernel time. Are you adding this up altogether?
Plus, I'm sure there is some overhead associated with the OpenCL API calls that's probably not included in the profiler timings.
Yes, I'm adding the Map & Kernel times together. Adds up to about 38ms. I'm not eager to move back to SDK 2.1 because it doesn't support OpenCL 1.1.
Naturally, the GPU time reported in the profiler doesn't include the run-time or driver overhead time (the difference between submitted and start timestamp).
We may report the run-time/driver overheads in some formats in the future version of the tool.
ryta1203,
My guess for the timing fluctations is because there are concurrent memory transfers at the same time as the kernel execution.
Originally posted by: bpurnomo
ryta1203,
My guess for the timing fluctations is because there are concurrent memory transfers at the same time as the kernel execution.
If this were true then one would think this would be consistent across different versions of the SDk; however, I'm only seeing this with SDK 2.2 and not SDK 2.1.
Does SDK 2.1 not support concurrent mem trans/kernel execution?
Also, I'm using cl_finish() in between each action.
Originally posted by: bpurnomo Naturally, the GPU time reported in the profiler doesn't include the run-time or driver overhead time (the difference between submitted and start timestamp).
We may report the run-time/driver overheads in some formats in the future version of the tool.
I would be very surprised that the overhead of 3 API calls (Map, Enqueue, Map) is costing 70ms.
I'm having issues with file associations with Visual Studio 2008 after I installed the stream profiler.
When I try to open any file that is associated with visual studio, I get a pop-up saying:
There was a problem sending the command to the program
The onlly other information provided was the path of the file in the message box displaying the error. The file is not opened once the message box is closed.
Interestingly enough, this only occurs when opening the file requires opening visual studio. So if I start visual studio first and then open the file either from the visual studio open dialog, or open the file from windows explorer there is no problem.
I've only had this problem with non-project/solution files.
Is there a way around this without just uninstalling the stream profiler (perhaps a setting somewhere)?
Specs:
ATI 5650HD, windows 7 64-bit home premium, Intel core i5 mobile processor, ATI Stream SDK 2.2 with stream profiler 1.4
Thank you for the report. We will investigate this problem.
Originally posted by: bpurnomo Naturally, the GPU time reported in the profiler doesn't include the run-time or driver overhead time (the difference between submitted and start timestamp).
We may report the run-time/driver overheads in some formats in the future version of the tool.
ryta1203,
My guess for the timing fluctations is because there are concurrent memory transfers at the same time as the kernel execution.
Why are there concurrent memory transfers?
Each call is waiting on some event and I have clFinish(commandQueue) wrapped around the kernel? Unless I am misunderstanding that function call's purpose?
Originally posted by: bpurnomo Naturally, the GPU time reported in the profiler doesn't include the run-time or driver overhead time (the difference between submitted and start timestamp).
We may report the run-time/driver overheads in some formats in the future version of the tool.
ryta1203,
My guess for the timing fluctations is because there are concurrent memory transfers at the same time as the kernel execution.
So can you please tell me how to get accurate timings without the memory transfers?
This happens even when I use blocking writes and reads.
When you use the profiler, do you see one or more CreateBuffer (or CreateImage) API calls with N/A timings?