AnsweredAssumed Answered

L1 Cache friendly code

Question asked by jai_cool on Sep 10, 2012
Latest reply on Sep 12, 2012 by mikewolf_gkd

Hi there,

 

I would like to know how to write L1 Cache friendly code. Assuming L1 Cache line is X Bytes long, if my wavefront needs X Bytes from device memory and other wavefronts in the block OR wavefronts from other blocks that run parallel on the same execution unit need the same X Bytes, I expect to see high L1 cache hit rate.

 

In the following code :

<code>

__kernel void testKernel(__global__ float *A, __global__ float *B)

{

     // Testing if this hits L1 cache.

     float val = B[0];

 

     if (get_group_id(0) == 0) A[get_local_id(0)] = val;

}

 

Is this L1 Cache friendly code ? Should I expect great L1 Cache hit rate running this through the AMD OpenCL Profiler running this for 100s of blocks, each block with 256 or 512 threads.

</code>

Outcomes