6 Replies Latest reply on Apr 1, 2012 3:50 AM by Skysnake

    ALUBusy question


      What does ALUBusy in APP profiler really mean?


      If there is branching in a kernel, the SIMD unit will run multiple passes to cover all cases. The ALUs are actually used less efficiently. Does ALUBusy reflect this?


      __kernel void MyKernel()


          float x = 0, y = 0;

          if (get_global_id(0) % 2 = 0)

             x ++;


             y ++;



      Should APP Profiler show ALUBusy is 100% or 50%?


      Thank you!

        • Re: ALUBusy question



          ALUBusy is a very important factor for program optimization.


          In cases of branching, some ALUs are "busy" with useless errand. I wonder if the useless job is counted as ALUBusy.


          Some branchings are inevitable. If a low ALUBusy is caused by these branchings, I will not try to optimize the program more.


          Any suggestions will be appreciated.


          Vis Cocoa

          • Re: ALUBusy question

            ALUBusy measures the percentage of GPU time ALU instructions are processed. There are many reasons for a low ALUBusy number, for example, not enough active wavefront to hide instruction latency or heavy memory access.

            Code divergence can be measured with VALUUtilization counter if you have SI hardware.

              • Re: ALUBusy question

                Hi Lihan Bin,


                Thank you very much for you answer.


                I need a clear definition of ALUBusy. According to your comment, ineffeciency caused by branching is not counted. Even if only one thread in a group is doing useful work, the ALU is 100% busy (at the moment).


                I think divergence is a very important factor for kernel tune-up.


                I did not find VALUUtilization in App Profiler. What is SI hardware?


                Thank you again and have a great weekend!


                Vis Cocoa