Archives Discussions

zenome · ‎02-27-2013

Hi All,

I was wondering if the cache hit statistics output by the AMD APP Profiler (v2.5) are reliable.

I was experimenting with a very simple kernel:

int iGID = get_global_id(0);

c[iGID] = a[iGID];

When I run the above kernel on AMD 7970, I get the CacheHit as low as 0.4%.

Now, since in this case the global memory accesses are coalesced, I was assuming that I should be getting a very high Cache Hit rate given the global memory accesses are cached.

When I run the following kernel:

int iGID = get_global_id(0);

c[iGID] = a[iGID] + a[iGID+1] + a[iGID+2] + a[iGID+3] +a[iGID+4]; // I am taking care of not overshooting the buffer size by last threads

I get CacheHit as 12%.

Again, I was expecting a very large CacheHit here.

I have played around these around with 4096 Threads and 256 WG Size. The array 'a' should easily fit inside the 16KB L1 cache per CU>

Are these numbers very low because:

The Profiler only samples a particular block, which if happens to be the first one, Cache Hits are low due to cold-misses (I know, it really can not account for these numbers but that is my only guess) ?

Archives Discussions

Unexpected Cache Hit statistics