Hello every one,
i have a problem with async_work_group_copy if i use it twice in a kernel it somehow does not work.
The Task is to create a MxN Matrix where every field is calculated with 2 x 45 x float3 (the 48 in copy is because of alignment (without using local memory it wirks fine).
The fields are calculated independently, i choose a 2-dim 16x16 ndrange.
So i thought i could copy the 2x16x48(45) float 3 to local memory because of multiple access.
My caching works if i only cache 16x48(45)