I want to write a kernel that use 4KB local memory per work item. Since only 32KB LDS is available per work group ,hence work group size can be atmost 8. Now if I were to use the total 64KB LDS then I have to shedule 2 work groups per CU. But if I want to eliminate or reduce LDS bank conflicts do I have to consider both the work groups while wrting the codes? Or the work group's memory access are independent of each other i.e local memory access pattern of one work group doesn't affect the other work group in a CU?
Also note that the work group will only use 8 channels out of 32 channels available. Now will the second workgroup in the CU use same set of 32 channels used by the first work group or an entirely different set of 32 channels ?