I am working on performance analysis of an application running on Family 17 processor, and I want to characterize some threads as memory bound, or L3 cache bound or L2 cache bound. To this end, I need to read Hardware performance counters, in particular NBPMCx0E0 DRAM Accesses from BKDG-Family15h.
On the recent family 17h, we don't have this register, rather we have Data Fabric performance counters I believe. I am referring to PPR for Family 17h NDA version.
Could you please let me know how we could access such performance counters on the new family processors? Also, currently I am using Linux perf to access raw counters using rNNN syntax. And I hope the counts I get can be "believed" keeping in mind that AMD has not delivered much code into perf for Family 17h.
PS: The threads are currently pinned to cores. So, in essence, if I read per-core event, I would be reading per-thread.
Update 1 :
Would be glad if someone could direct!