cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

viscocoa
Adept I

ALUBusy question

What does ALUBusy in APP profiler really mean?

If there is branching in a kernel, the SIMD unit will run multiple passes to cover all cases. The ALUs are actually used less efficiently. Does ALUBusy reflect this?

__kernel void MyKernel()

{

    float x = 0, y = 0;

    if (get_global_id(0) % 2 = 0)

       x ++;

    else

       y ++;

}

Should APP Profiler show ALUBusy is 100% or 50%?

Thank you!

1 Solution
lbin
Staff

ALUBusy measures the percentage of GPU time ALU instructions are processed. There are many reasons for a low ALUBusy number, for example, not enough active wavefront to hide instruction latency or heavy memory access.

Code divergence can be measured with VALUUtilization counter if you have SI hardware.

View solution in original post

0 Likes
6 Replies
viscocoa
Adept I

Hi,

ALUBusy is a very important factor for program optimization.

In cases of branching, some ALUs are "busy" with useless errand. I wonder if the useless job is counted as ALUBusy.

Some branchings are inevitable. If a low ALUBusy is caused by these branchings, I will not try to optimize the program more.

Any suggestions will be appreciated.

Vis Cocoa

0 Likes

Vis,

       Did you find the answer? I am wondering about the same thing.

NURBS.

0 Likes

Hi NURBS,

I think Lihan Bin gave us the correct answer.

BTW: I ever worked on NURBS for a while 🙂 Defining a beautiful surface with a few control points. That's really cool!

Vis Cocoa

0 Likes
lbin
Staff

ALUBusy measures the percentage of GPU time ALU instructions are processed. There are many reasons for a low ALUBusy number, for example, not enough active wavefront to hide instruction latency or heavy memory access.

Code divergence can be measured with VALUUtilization counter if you have SI hardware.

0 Likes

Hi Lihan Bin,

Thank you very much for you answer.

I need a clear definition of ALUBusy. According to your comment, ineffeciency caused by branching is not counted. Even if only one thread in a group is doing useful work, the ALU is 100% busy (at the moment).

I think divergence is a very important factor for kernel tune-up.

I did not find VALUUtilization in App Profiler. What is SI hardware?

Thank you again and have a great weekend!

Vis Cocoa

0 Likes

SI is the short form for Southern Islands, the newest GPU architectur of AMD. SI is also known as the HD79x0, HD78x0 and HD77x0 series for the desktop. For the mobile chips you have to look.

0 Likes