cancel
Showing results for 
Search instead for 
Did you mean: 

GPU Developer Tools

bpurnomo
Staff
Staff

New! ATI Stream Profiler version 1.4 is now available

We are pleased to announce the release of a new version of ATI Stream Profiler, version 1.4.

ATI Stream Profiler is a Microsoft® Visual Studio® integrated runtime profiler that gathers performance data from the GPU as your OpenCL™ application runs. This information can then be used by developers to discover where the bottlenecks are in their OpenCL™ application and find ways to optimize their application's performance.


New updates in this version include

  • Support for Stream SDK v2.2.
  • Support OpenCL™ 1.1.
  • Support Microsoft® Visual Studio® 2010.
  • Support for command line interface.
  • Added support to check whether the current version is up-to-date.
  • Fixed data transfer size for image objects.
  • Updated counter names and descriptions.

 

Please post your feedback here.

0 Kudos
Reply
21 Replies
tomhammo
Journeyman III
Journeyman III

New! ATI Stream Profiler version 1.3 is now available

Thanks! the additional performance counters are useful.

However  - there is one performance counter I think a few of us would love to see in v1.4 of the profiler: the number of concurrent workgroups per SIMD.

For example, say four workitems use 8 KB of local memory - ideally four would run in parallel at the same time (there is not more local memory for 5 or more). At the moment there is no way to verify that this has actually occurred - other than in roundabout ways, for example by measuring execution time.

Whilst it is possible to determine this by checking resource usage of each work item (#GPRS, amount of local memory) and comparing to the max available per SIMD... it would be great to have a performance counter verifying the exact amount of parallelism that ends up being exploited.

For example - right now I am pretty sure that a kernel I am working on only interleaves two workgroups per SIMD at a time - even though I have enqueued more than enough workgroups and adjusted resource usage so at least four workgroups should be running in parallel per SIMD. but performance says otherwise. The performance counter would save me a lot of time tracing this issue down.

regards,

- Tom Hammond

0 Kudos
Reply
bpurnomo
Staff
Staff

New! ATI Stream Profiler version 1.3 is now available

Thanks for the feedback, this is a great suggestion.  This performance counter is not possible with our current hardware architecture, however we'll consider it for future generations.

0 Kudos
Reply
ryta1203
Journeyman III
Journeyman III

New! ATI Stream Profiler version 1.3 is now available

I have an issue with the profiler.

It's profiling my kernel fine at smaller dimenions (ie. 256^2 or 1024^2), but at larger dimensions (2048^2) I'm getting no output for the profiler, even though the code is running fine (checked against CPU reference version!)??

Any ideas as to why this might be happening? Could it have something to do with the kernel size?

0 Kudos
Reply
bpurnomo
Staff
Staff

New! ATI Stream Profiler version 1.3 is now available

Would you be able to send us a test case so we can reproduce it in house?  Please send it to gputools.support@amd.com.

0 Kudos
Reply
ryta1203
Journeyman III
Journeyman III

New! ATI Stream Profiler version 1.3 is now available

Originally posted by: bpurnomo Would you be able to send us a test case so we can reproduce it in house?  Please send it to gputools.support@amd.com.

It now runs fine. Side note: my GPR usage has increased significantly without me changing any code, odd.

0 Kudos
Reply
ryta1203
Journeyman III
Journeyman III

New! ATI Stream Profiler version 1.3 is now available

Originally posted by: bpurnomo Would you be able to send us a test case so we can reproduce it in house?  Please send it to gputools.support@amd.com.

bpurnomo,

   I'm now able to profile 2048*2048 with the new SDK but now when I go to 3072*3072 I get the same problems as before (ie. no profiling information even though the code verifies against CPUreference just fine).

0 Kudos
Reply
ryta1203
Journeyman III
Journeyman III

New! ATI Stream Profiler version 1.3 is now available

bpurnomo,

  I also have a question as to why the timing of the kernels via the profiler varies so much?

  For example, sometimes I get 14ms for a run and other times 18ms.

Currently, I am just taking the mean over 10 or so runs to get a more stable timing, but is this fluctuation normal?

0 Kudos
Reply
ngaloppo
Journeyman III
Journeyman III

New! ATI Stream Profiler version 1.3 is now available

Using the new version (1.4).

I am experiencing inconsistency between my CPU timings and what ATI Stream is reporting. In pseudocode: 

timer.start();

clEnqueueMap*();

etc...

clEnqueueNDRangeKernel();

clFinish();

timer.stop();

 

Total GPU time as reported by ATI Stream profiler 1.4: ~ 38 ms

Total CPU time as reported by timer class (simply using QueryPerformanceCounter): ~ 110 ms

 

Is there any reason for this inconsistency that I'm not understanding? Thanks!

 

 

 

0 Kudos
Reply
ryta1203
Journeyman III
Journeyman III

New! ATI Stream Profiler version 1.3 is now available

Just a guess but I believe the profiler times just the tranfers and the kernel time. Are you adding this up altogether?

Plus, I'm sure there is some overhead associated with the OpenCL API calls that's probably not included in the profiler timings.

0 Kudos
Reply