cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bpurnomo
Staff

ATI Stream Profiler is now available!

The next generation GPU Performance Analysis Tool for Stream Computing

We are pleased to announce the release of a new tool ATI Stream Profiler 1.0, a performance analysis tool to analyze OpenCL programs on ATI Radeon graphics cards fully integrated into Microsoft Visual Studio 2008.

Features of the tool include:

  • Measure the execution time of an OpenCL kernel
  • Query the hardware performance counters on ATI Radeon graphics card
  • Display the memory traffic from and to GPU
  • Compare multiple runs (sessions) of the same or different programs
  • Store the profile data for each run in a csv file
  • Display the IL and ISA (hardware disassembly) code of the OpenCL kernel

You can download this tool from the AMD developer website here:

http://developer.amd.com/gpu/StreamProfiler/Pages/default.aspx

Please post your feedback here.

0 Likes
28 Replies
schrotti007
Journeyman III

Hi all

I have a problem with the Stream Analyser. The csv view seems to use a semicolon ';' as a field delimiter instead of comma ','. That prevents the table from beeing displayed correctly, as well es preventing to display the isa for the kernel. Using Process Monitor, I found out that it tries to open something like "nbody_sim_cool, 10, {2048.il", but on the correct path.

Im not sure but maybe its an localisation issue, im using a german windows xp and a german visual studio

 

0 Likes

Thank you for the report.  We will investigate this issue.

0 Likes

Thanks.

Any idea if it will work on a 4830 as I can not establish if 4830 is opencl / ati stream ready or not. gpu-z reports it is not but other post syas it is.

0 Likes

ATI Radeon HD 4830 is in the ATI Stream supported list (with Beta support).

 

0 Likes

We have just released ATI Stream Profiler v1.1 with various bug fixes.

0 Likes

Can we PLEASE have cache hit ratio included in the profiler!?

0 Likes

Thank you for your feedback.  We are working on adding more performance counters to the profiler.

 

0 Likes

Is it even possible to make a full profiler for the GPU?

0 Likes

If you are asking whether there are other hardware performance counters supported by the GPU, yes there are.  Whether they make sense to stream developers is another matter. 

 

0 Likes

Sadly, no cache counters with new profiler?

CAL has had cache counters for awhile now... would this simply not be useful in OpenCL? From the posts by Micah and others, it seems that OpenCL is still using cache...

0 Likes

Cache counters is in the work.

 

0 Likes

Just a quick question regarding the output of Stream Profiler.

The ALU instructions display is total ALU instructions?  Not cycles?

So to estimate the cycle runtime we would need the packing percentage average times the ALU counter?

 

Just looking for a gross estimate as we tweak things.

Thanks! Chris

0 Likes

Sorry if that wasn't entirely clear, just realized I didn't mention assuming a 5-wide ALU path.  So ALU instructions / (5*alu saturation)...

Something along those lines roughly accurate?

0 Likes

Hi,

The ALU counter displays the total vector ALU instructions.

To estimate the ALU cycle time, you can also use the ALU counter value (1 cycle per 1 vector ALU instruction).

 

0 Likes

Perfect, that's what I figured.  Thanks!

0 Likes

Hi Chrisjp,

Please see my updated reply above. 

 

0 Likes

Originally posted by: bpurnomo Hi,

The ALU counter displays the total vector ALU instructions.

To estimate the ALU cycle time, you can also use the ALU counter value (1 cycle per 1 vector ALU instruction).

 

So the profiler reports the number of ALU Bundles, not the actual number of ALU operations?

0 Likes

Correct.  The ALU counter reported in the profiler is the number of vector ALU operations.
To get the number of scalar ALU operations (each vector ALU consists of upto five scalar ALU operations), use the following equation:
Scalar ALU = ALU * 5 * ALUPacking.

0 Likes

And this is verified 100% accurate?

0 Likes

The ALU and ALUPacking counter are an average over all the wavefronts.

 

0 Likes

Yeah, it being a vector average makes more sense and agrees with our data.

For example one of our tests is running a 6426 alu reported kernel on a 4850 @ 650Mhz with constant input data to minimize branching etc.

Predicted best-case throughput is (650M/alu * 160)
~16,200,000

run time observed = ~67 ms for just over 1M data points = 15,728,640.

So it's giving a fairly good approximation in our case.

0 Likes

hi, is there a way to get stream profiler work with visual studio 2010?

thanks

0 Likes

We do not currently support Visual Studio 2010 but plan to include support in the very near future.

0 Likes

thanks for the answer

0 Likes

Oh well. What does "very near future" mean - can we expect it before end of july?

Sorry for being impatient, but i need this information for a project.

0 Likes
mux85
Journeyman III

will the next release support opencl 1.1? is there an approximate release date? thanks

0 Likes

You can expect the next release of the profiler to support OpenCL 1.1.  We can’t comment on a future release date.  Please check developer.amd.com for an update of the releases for the tools.

 

0 Likes

thanks

0 Likes