Hey guys, I made some tests using the benchmarks in AMD APP SDK v2.4. There is a benchmark called 'GlobalMemoryBandwidth', which is used to test global memory bandwidth using different memory access patterns. There are four memory access patterns in this benchmark: read linear, read linear uncached, read single, and write linear.
When using 'read linear', I found that float1 performs much better than float4, with bandwidth 263 GB/s vs. 155 GB/s. Undoubtedly, float1 is exploiting cache, since the bandwidth is much higher than the theoretical bandwidth 153.6 Gb/s. But why cannot float4 exploit cache? Can anybody tell me the reasons? Thanks a lot.
I am using HD5870, and AMD APP SDK v2.4. The OS is Ubuntu 10.04.