cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

zinnia
Adept I

Large L3 cache for a single thread run(s)

 
 
 
 
Hi!
I'm looking for a CPU that would best suit for our scientific computing demand: We are running a large MCMC simulation model. The MCMC doesn't vectorise, thus we are essentially taking about optimising single thread performance. In practice, though, we need to run at least 2 parallel chains in separate threads. The amount of L3 cache that a thread can utilise is the most crucial feature in determining the run speed (considering that the clock rates tend to be high overall). I've only recently understood that AMD CPU's contain core complexes, and thus, e.g. 64MB in Ryzen Threadripper Pro 3945WX cannot be fully utilised in a single thread run, but could be used for running 4 threads each with 16MB L3. Is this correct?
Furthermore, the upcoming Zen3 / Ryzen 5xxx take the decision making into next level. Which one of the versions would maximize the L3 per thread in practice? What CPU would you recommend for us?
Any other points and recommendations are welcome! Thank you in advance!

(P.S. Sorry if this goes a bit off topic in server guru's site. I was directed here by AMD:s tech support. Hope you can guide me anyhow)

5 Replies
Anonymous
Not applicable

Hello zinnia‌,

Thanks for reaching out for your question.  For specific details on the Ryzen product line, I can move the thread to that CPU support forum.  In general though, for how you are describing your workload (single threaded, want as much cache as possible per thread) you are correct that an EPYC 7002 series (also based on Zen2 architecture) processor would be 16 MB per core complex (or CCX).

If you were looking at an EPYC part number, for the scenario you describe, I would recommend the EPYC 7F52 part.  This is a 16 core part and already has 16 MB L3 cache per core.  As you also eluded, with this part you can run 16 threads in parallel to get the most throughput.

Thanks mbaker_amd!
I need to clarify that we unfortunately don't have enough budget for server cpu's, but we are looking for a solution that would cost efficiently maximise L3 cache in a desktop. That is why the Ryzen and 3000/5000-series seem like a good option for us.
Thanks for the confirmation that with zen2 archichture we can reach 16MB per CCX. How is this in zen3 architecture, will CCX specific L3 double to 32MB in all new Ryzen versions (i.e. those mentioned in https://www.techradar.com/news/amd-zen-3) or does the L3 per CCX depend on somehow on the CPU version and the number of cores? 
0 Likes
Anonymous
Not applicable

Hello zinnia‌,

I cannot comment on future EPYC processors, and don't know the details of the recently announced Ryzen series of processors to be 100% accurate in my response.  I will move this thread to the desktop CPU support forum for those members to respond to your questions.

On the upcoming desktop cpu 5xxx or Zen3 each CCX will have 32mb each individual core has full access to all 32. So from how I understand it a single thread does have full access to all 32 and 2 threads in theory running in parallel would have 16 each under full load. Obviously the processor manages the load and not sure how well it holds that statically. If you go to a second CCX such as 12 or 16 core processors that CCX also has its own dedicated 32. So a total of 64 on the chip but the 64 is not shared. You could send your question to AMD support and maybe they can get it in the hands of someone from AMD that can definitively answer your question. Contact them here: https://www.amd.com/en/support/contact-email-form 

Also the AMD developer forum has folks way above my knowledge level that may be able to help, a link to that forum here:

Devgurus 

0 Likes

Thank you for the answer and the links! This seems like if we'd have 64MB, we could run two parallel threads in separate CCX's each with 32MB cache, which would be wonderful. I will contact the AMD support to get a confirmation for this, plus to find out the exact  options available.

0 Likes