Hi, I am working on a program written in OpenCL and running on Fusion APU (CPU+GPU on one die). I wan to get some performance counters such as instructions number, branch number and so on. I have two tools on hand: AMD APP Profiler and CodeAnalyst. When I use the APP Profiler, I found that it seems can only provide instructions counter for GPU, cannot for CPU. Then I use CodeAnalyst, but then three confusions occurred.
- On App Profiler, it can give the number of ALUInsts (i.e. the number of executed ALU instructions per work-item) is about 70000. The whole thread space on GPU has 8192 threads, so I intuitively think there are 70000 * 8192 instructions executed by GPU. Is that right?
- When I use CodeAnalyst to measure the instructions for the same program on CPU part, it just gave "Ret inst", "Ret branch" such kind of counters, but I am not sure about one thing: this program runs on both CPU and GPU at the same time, what are these counters for? For CPU only, for GPU only? or the sum?
- No matter what these counters for, I found that the value of Ret Inst (i.e. retired instructions) is about 40000, it seems too small for the whole program, I guess the instructions for a program should be at order of billions, how it could be only 4w? The attached pic shows the results.
Is there any people can help me resolve these confusions, I am just a tyro here, wish kind help from all of you. Thanks!