Hello, I thought caching for global memory fetches is not possible at this time? How is it possible that gdebugger and the amd app profiling tool say that I have a cachehit of nearly fiffty percent for a simple 7-point stencil kernel?
Solved! Go to Solution.
Caching has been supported in various forms since SDK 2.3. It started out as explicitly specified and then moved to auto-detection for caching and then in SDK 2.6 move to caching by default and auto-detection for uncached.
Caching has been supported in various forms since SDK 2.3. It started out as explicitly specified and then moved to auto-detection for caching and then in SDK 2.6 move to caching by default and auto-detection for uncached.
Thx for the answer. I didn't know that. So the following kernel has an execution time for a 128x128x128 problem size of approx. 250.000 ns on my Radeon 5870.
#define IX33(i,j,k) ((i) + ((X)*(j)) + ((X)*(Y)*(k)))
__kernel void jacobi(__global float * x,
__global float * x0,
__global float * aux,
const float a)
{
int i = get_global_id(0);
int j = get_global_id(1);
int k = get_global_id(2);
int X = get_global_size(0);
int Y = get_global_size(1);
int Z = get_global_size(2);
int minCoord = 0;
int maxCoord = X-1;
float out;
out = ( (x[IX33(max(i-1,minCoord),j,k)] + x[IX33(min(i+1,maxCoord),j,k)] +
x[IX33(i,max(j-1,minCoord),k)] + x[IX33(i,min(j+1,maxCoord),k)] +
x[IX33(i,j,max(k-1,minCoord))] + x[IX33(i,j,min(k+1,maxCoord))]) +
ac.x * x0[IX33(i,j,k)]) * a;
aux[IX33(i,j,k)] = out;
}
So when I calculate the memory throughput with 32 byte *128 *128 *128 / 250.000 ns = 268GB/s there has to be something wrong with my calculation right?
No, it means you got some cache hits