I am trying to increase the scaling of an application on a 4-processor machine (48 cores), currently the max. speed is achieved with around 32-36 active cores. No I/O except for main memory is used and the task is well suited to scaling.
Obviously the only reason for limited scaling can be the accesses to the main memory.
My question now is: How can I best use Codeanalyst to find the critical locations in the application which are responsible for the contention problems?
I would recommend using the "Instruction-based sampling" profile, and when the profile data shows up, changing the view from "All Data" to "IBS MEM data cache" and look at the areas of code with the highest data cache miss rates or ratios. You might also be interested in the "IBS NB local/remote access" view, with information about memory requests that go to remote processors.
I hope that helps.