very sorry for posting this twice, my browser crashed during posting
It looks like the issues has not been explained completely. Can you provide your system configuration(CPU,GPU,SDK,DRIVER,OS) and a small test case.
Originally posted by: matze_de Hello every one,
i have a problem with async_work_group_copy if i use it twice in a kernel it somehow does not work.
The Task is to create a MxN Matrix where every field is calculated with 2 x 45 x float3 (the 48 in copy is because of alignment (without using local memory it wirks fine).
The fields are calculated independently, i choose a 2-dim 16x16 ndrange.
So i thought i could copy the 2x16x48(45) float 3 to local memory because of multiple access.
My caching works if i only cache 16x48(45)
Could you paste you kernel code here? It would be good if paste runtime code also.