If I am running a kernel on a GCN device, and my workgroup sizes is <= 64, do all work items need to encounter
async_work_group_copy ? I've asked a similar question in the past about work items encountering memory barriers,
and I was told that barriers are not needed in this case.
Retrieving data ...