Archives Discussions

jai_cool · ‎09-11-2012

Hi there,

I would like to know how to write L1 Cache friendly code. Assuming L1 Cache line is X Bytes long, if my wavefront needs X Bytes from device memory and other wavefronts in the block OR wavefronts from other blocks that run parallel on the same execution unit need the same X Bytes, I expect to see high L1 cache hit rate.

In the following code :

__kernel void testKernel(__global__ float *A, __global__ float *B)

{

// Testing if this hits L1 cache.

float val = B[0];

if (get_group_id(0) == 0) A[get_local_id(0)] = val;

}

Is this L1 Cache friendly code ? Should I expect great L1 Cache hit rate running this through the AMD OpenCL Profiler running this for 100s of blocks, each block with 256 or 512 threads.

Archives Discussions

L1 cache hit rate in profiler