See section 6.2 of the amd app opencl programming guide v2.1a, it has all the details.
Seems to be a granularity of 32 threads by my reading, up to float2 access per thread.
(i'd paste an excerpt but i'm reading it on another machine)
Some threads access different address in the same bank at the same time, this will generate bank conflicts. All AMD Evergreen GPUs contain a 32K LDS for each compute uint. On high-end GPUs, the LDS contains 32-banks, each bank is four bytes long, and the bank address is determined by bits 6:2 in the adddress. On lower-end GPUs, the LDS contains 16 banks, each bank is still 4 bytes in size, and the bank used is determined by bits 5:2 in the address.