I can't seem to find a discussion about local memory bank conflict behavior on AMD.
I saw in the documentation that on the GPU that I'm currently working with there are 32 local memory banks and a wave front size of 64 threads. I'm assuming according to these numbers that half a wavefront schedules access to local memory together and can cause bank conflict if accessing the same bank. Is that correct or is the local memory accessing granularity different?
See section 6.2 of the amd app opencl programming guide v2.1a, it has all the details.
Seems to be a granularity of 32 threads by my reading, up to float2 access per thread.
(i'd paste an excerpt but i'm reading it on another machine)
Some threads access different address in the same bank at the same time, this will generate bank conflicts. All AMD Evergreen GPUs contain a 32K LDS for each compute uint. On high-end GPUs, the LDS contains 32-banks, each bank is four bytes long, and the bank address is determined by bits 6:2 in the adddress. On lower-end GPUs, the LDS contains 16 banks, each bank is still 4 bytes in size, and the bank used is determined by bits 5:2 in the address.