Any data you read -- automatically goes into L1 cache.
After all, caches are built on principles of locality....
Apart from that x86 CPU offers "data cache" handling instructions.. like flushing them, invalidating them etc..
you need to reach out to assembly to use them..
SSE instructions (as pointed out well by realhet) too have cache related intrinsics..
If you are too worried about cache -- You first need to look at your memory access pattern.
If you are having a linear access....like for(int i=0; i<BIG_N; i++) { ARRAY += ARRAY[i+1] }
Then you are already cache-friendly...You probably need to look at vectorization to improve your code (which again is SSE route)
However, if you are having strided access, you are in for trouble.
Classic case is matrix multiplication.. While A's row access is Linear, B's column access is completely strided... i.e successive fetches are many bytes away....It is possible that such fetches may map to the same cache-line - in which case associativity of your cache matters.
In such cases, Tiling helps. C matrix is computed in blocks of smalelr chunks which have better cache usage...
Hope this helped...