A year and a half ago I asked a question here on the forum about the need for local memory barriers for
work groups with local size <= 64 (size of wave front).
The answer at that time was that the compiler would remove these barriers anyways, so they didn't need
to be added to the code, even though the spec requires them.
I can verify now that for Polaris arch, they are needed. This change has been causing a hard crash on my application
when targeting Polaris.
Lesson is to follow the spec Or, re-validate assumptions with every new architecture.