Achieving optimal performance is not always straight forward thing. It depends on the various factors. There are guidelines that provide tips to improve the performance for certain scenario on certain platforms. However, one should do some profiling to examine the actual performance. It can help greatly to achieve optimal performance. So, I would suggest you to profile your application with various settings before making any final decision. AMD's CodeXL tool could be used for this purpose.
Now, coming to your questions.
1) One of the direct impact of LDS size is that it limits the number of work groups that can be active in a CU. For more details, I would refer you to check the section "220.127.116.11 Local Memory (LDS) Size" in AMD's OpenCL optimization guide where Table 2.2 shows the effect of LDS usage on wavefronts/CU.
2) When accessing the the LDS, one of the main consideration is avoiding (or at least minimizing) the bank conflicts. The optimization guide says:
"A simple sequential address pattern, where each work-item reads a float2 value from LDS, generates a conflict-free access pattern on the AMD Radeon HD 7XXX GPU. Note that a sequential access pattern, where each work-item reads a float4 value from LDS, uses only half the banks on each cycle on the AMD Radeon HD 7XXX GPU and delivers half the performance of the float access pattern."