I have a AMD Radeon Mobility 5870 and I am using OpenCL with it.
I believe that the standard desktop Radeon 5870 has 256K of registers per compute unit.
Is the same true for the Mobility 5870? If not, what is the register capacity?
As an aside, this information is one of the few items not in the clinfo data provided by the OpenCL API.
The register capacity is same as 256KB for the desktop as well as mobility versions. The difference being in the number of compute units.
Desktop 5870 has 20 CUs, whereas mobility 5870M has 10 CUs. (Essentially 5870M is same as Juniper, and does not support DP where as desktop 5870 does)
You can get more information in AMD OpenCL Programming Guide Appendix D
Thanks, that helps very much.
I had another question about caching.
In general I have heard that there is very little caching of Global memory on the 5870M. However, I noticed in the AMD OpenCL programming guide that there are L1 and L2 caches on the card. Do these ever get used in caching Global data.
Yes L1, L2 caches do get used, and are very similar in behaviour to normal CPU caches. Consecutive access pattern is good as many fetches can be satisfied by the same cache line. Try doing profiling using CodeXL, and you can know what % of your fetches are cache hit.
Hmm.. That's interesting. Is it true that the caches on the 5870M are smaller than a CPU cache? Does that mean less of the Global memory can be cached than is cached on a typical CPU.
Yes, GPU has very less caches as compared to CPUs. They rely on multiple threads running together to hide the data fetchtime. On CPUs caches are very critical for performance, so big part of die area is caches only. Check some basic resources on GPGPU Programming and check AMD OpenCL Programming Guide, to know the cache sizes of your GPU. clinfo should also print valuable information.
constant memory depends on the access pattern. Check __constant memory section in chapter 4 of OpenCL Programming guide. http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-...