Recently I query the device info of AMD APU A10 7850K in OpenCL by "clinfo", as the results show, both the integrated CPU and the GPU has "16384" Bytes "Cache size". I am confused with the results because of two things:
1) For the CPU, it has 4MB L2 data cache in total and 4x16KB=64KB L1 cache in total. Why the result shows 16384B=16KB? Could it be the L1 cache size of each CPU? That does not make sense because as OpenCL specs say: it should be "Size of global memory cache in bytes", that is L2 cache size.
2) What the value mean to the GPU? Is it the L1 cache for the GPU? or it is the shared L2 cache with the CPU?
I am really confused with these issues. Maybe it is just my misunderstanding? Thanks!
If you stop thinking a bit, you'll realize nobody cares about total amount of cache, but rather the total amount of cache per core. This is the case as you need to fit this size for best performance. You can also reconstruct the total size by using the core count.
Nonetheless, since there's no fast L1-to-L1 sharing you have to think at each cache as an independent set of memory.
We can go a long way discussing if that should be L1 size or L2 size. Sure the wording is a bit relaxed but perhaps this is better left to implementors. Is L2 per core? Is it shared across so many cores? On GPU, it's a per-memory-channel buffer.
For GPU it's very likely GCN L1 size. Again, that's 16KiB, that's per core (except 1 GCN core is fairly different thing from x86). That's standard on all currently selling Radeons 7000 and up, as well as APUs such as yours, except some low-end or mobile products.
While we're at it, you might read about local memory. You can consider it a faster and more efficient cache to be managed manually (as opposed to cache being automatic).
Thanks for your kind reply. I understand that to fit into L1 cache must improve the performance. To keep coherency among all caches, inevitable overhead should be introduced. If your answer is true, then I guess it is better in OpenCL specs to clarify that the size should be "dedicated-to-that-core" cache or "performance-optimal" cache. Thanks!