Even AMD is still measuring and learning. In lieu of documentation...
Maybe a tool like AIDA can list instruction latencies?
(links to old data from an early engineering sample at http://users.atw.hu/instlatx64/AuthenticAMD0800F00_K17_Zen_InstLatX64.txt)
(But there have been reports/rumors that AIDA was not measuring cache timing correctly, perhaps due to differing latencies depending on the relative location of the CCX or Core with the cache line, or due to cache-line clearing optimizations.)
For optimizing code, ideas from the GDC 2017 talk on Optimizing for Ryzen
(experiments might have been performed and slides written before final silicon was available)
However, the output says it is from the aida_bench64.dll library version dated February 9.
So it might not use AIDA updates for Ryzen 7. AIDA summarized changes forthcoming for Ryzen, as of March 4.
The output says it is from the aida_bench64.dll library version dated March 17.
On March 28, AIDA announced version 5.90, claiming benchmarks were optimized for Ryzen, among other improvements.
The 7-cpu site offers some cache latency estimates for Ryzen, perhaps deduced from runs of their LZMA benchmark.
(It doesn't give measurements for different stride lengths as the AIDA memory test data does.) (Via an RWT forum post.)
- Microarchitecture [slides overview caches, data paths, new instructions, performance counter domains]
- Power Management [settings affect boost: windows 10 power management settings, profiling tool interval settings, BIOS settings]
- Profiler: CodeXL [traditional sampling or Instruction-Based Sampling (fetch, or micro-op); PowerProfiler: frequency profiling and/or energy profiling]
- Compiler [overview of MS Visual Studio compiler changes in recent years]
- Concurrency [profile to decide where to use SMT; recommendation for number of contexts for D3D12 multithreading]
- Shader Compiling [not during gameplay, use many threads, use shader caches]
- Prefetch [avoid software prefetch, prevents loop unrolling in MSVS2015]
- Data Cache [use structure of arrays, rather than array of structures, for denser data in cache and better prefetching.]
I know what you mean around AMD Ryzen processors, you are interested in the changes for Instruction Set Architectures and want to know the things related with optimising the codes to fully utilise its underlying microarchitecture. But at this moment there is nothing like that present, and there is no compilers designed or optimised exclusively for AMD Ryzen processor. I am also eager to know the detail information about the Microarchitecture resident under this newly coming processor too.
Project Mercury: Thread affinities to CCXs, SMT etc optimizations. Very light weight/efficient.
AMD Ryzen Processor Optimization added to Cacheman 10.10:
Bitsum's Process Lasso: Optimize and automate process CPU affinities:
ie: I dont know a damn thing about coding but the above coders do, so they, and what they have done may have/be some pointers for everyone.
The rules seem to be as follows:
- Keep threads from hopping from one CCX to the other and try to keep windows/OS on one CCX and the app or game on the other/s.
- Keep to one thread per physical core, until you run out of cores on a CCX. ie: Avoid SMT until you need to run more than 4 threads per CCX/app.
- Disable core parking. (Part of AMD's balanced power plan?)