Here in this file(appendix d), I had enough info for my HD7000 series card . http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programm...
Now I have a RX550. Where can I find similar info?
Here is a screenshot which was useful for me to check which part of an OpenCL kernel was bottlenecking:
The architecture has not changed much as it is still GCN. L2 cache size has increased, but I'm pretty sure it's still 1 cache line (64B) per clock per channel (there are 4 channels on polaris). L1 is 1 cache line per CU per clock.