cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

eduardoschardong
Journeyman III

OpenCL profiler - Questions and minor bug report

Bug report: The .csv isn't presented correctly for some locations.

Latin locations, for example, use "," instead of "." for the decimal separator and "." as just a grouping digit, when the profiler display the output values are transformed depending on the culture on the machine, this cause the values to be displayed incorrectly, % units are displayed as, for example, 8107, KernelTime is displayed as 2299992 when it should be 22,99992 (or 22.99992 if culture invariant) and so.

 

Questions:

1) Is FetchUnitStalled the percent of KernelTime spent on fetch units while they are doing nothing?

2) Is FetchUnitBusy - FetchUnitStalled the percent of KernelTime spent on fetch units while they are actually doing something usefull?

3) What the GPU is doing when FetchUnitStalled? I mean what cause stalls? If I have 94,07 for FetchUnitStalled and 90,74 for FetchUnitStalled what's likely the problem with my kernel?

 

note: WriteUnitStalled is always 0, it looks buggy since kernels with to much writes are slow even when all numbers are low.

 

0 Likes
9 Replies
ryta1203
Journeyman III

Why doesn't the Profiler say how many GPRs are being used? This could be relevant.

That would be great, thanks... unless AMD is planning on releasing a new SKA with OpenCL support (but it doesn't look like they are).

0 Likes

Originally posted by: ryta1203 Why doesn't the Profiler say how many GPRs are being used? This could be relevant.

 

That would be great, thanks... unless AMD is planning on releasing a new SKA with OpenCL support (but it doesn't look like they are).

 

If you look into the ISA output, near the end there is a report for the number of GPRs used.

Hey, you asked for a run-time profiler when we released SKA a while back.  Now, are you asking for SKA? 

0 Likes

LOL, no, not if the Profiler reports GPR, which apparently it does, so that is good.

Thanks for the quick reply.

I would still like a profiler for Brook+....

Is there anyway to profile Brook+ programs with the current Profiler (via CAL/IL or something)??

0 Likes

Originally posted by: ryta1203 LOL, no, not if the Profiler reports GPR, which apparently it does, so that is good.

 

Thanks for the quick reply.

 

I would still like a profiler for Brook+....

 

Is there anyway to profile Brook+ programs with the current Profiler (via CAL/IL or something)??

 

Unfortunately, no.

0 Likes

Originally posted by: bpurnomo Hey, you asked for a run-time profiler when we released SKA a while back.  Now, are you asking for SKA? 

But I like SKA, I want it back

 

The run time profiler is a good tool, precise and so, but SKA was good too, more "agile", looking at generated ISA and statistics after typeing each line was very useful.

 

Checking to see if it compiles after each change was useful too

 

I liked the profiler too, but it is more a complement to SKA than a replacement, and yet about SKA, if it was integrated to Visual Studio too it would be perfect

 

0 Likes

What he said above!

0 Likes

Good News!  We have just released SKA v1.4 with OpenCL support.

 

0 Likes

Thanks for updating this, SKA is a really great tool for fast kernel analysis.

 

However, output window limited to 1M symbols. With current compiler behavior -- "just unroll everything" we reaching this limit in no time.

 

0 Likes
bpurnomo
Staff

Originally posted by: eduardoschardong Bug report: The .csv isn't presented correctly for some locations.

 

Latin locations, for example, use "," instead of "." for the decimal separator and "." as just a grouping digit, when the profiler display the output values are transformed depending on the culture on the machine, this cause the values to be displayed incorrectly, % units are displayed as, for example, 8107, KernelTime is displayed as 2299992 when it should be 22,99992 (or 22.99992 if culture invariant) and so.

 

Thank you for the bug report.  We will investigate this localization issue.

 

1) Is FetchUnitStalled the percent of KernelTime spent on fetch units while they are doing nothing?


It is the percentage of GPU time the Fetch units is waiting for results (not doing anything).

 

2) Is FetchUnitBusy - FetchUnitStalled the percent of KernelTime spent on fetch units while they are actually doing something usefull?


Yes

 

3) What the GPU is doing when FetchUnitStalled? I mean what cause stalls? If I have 94,07 for FetchUnitStalled and 90,74 for FetchUnitStalled what's likely the problem with my kernel?


When fetch units are stalled, it is possible the other units are doing something useful.  It is also possible that other units are stalled waiting for the results from the fetch units.

Your problem is likely that you have too many fetches, not enough wavefronts inflight to hide the fetch latency.

 

note: WriteUnitStalled is always 0, it looks buggy since kernels with to much writes are slow even when all numbers are low.


Thank you for reporting this bug.

 

0 Likes