Hello every one,
i have a problem with async_work_group_copy if i use it twice in a kernel it somehow does not work.
The Task is to create a MxN Matrix where every field is calculated with 2 x 45 x float3 (the 48 in copy is because of alignment (without using local memory it wirks fine).
The fields are calculated independently, i choose a 2-dim 16x16 ndrange.
So i thought i could copy the 2x16x48(45) float 3 to local memory because of multiple access.
My caching works if i only cache 16x48(45)
very sorry for posting this twice, my browser crashed during posting
It looks like the issues has not been explained completely. Can you provide your system configuration(CPU,GPU,SDK,DRIVER,OS) and a small test case.
Thanks
Originally posted by: matze_de Hello every one,
i have a problem with async_work_group_copy if i use it twice in a kernel it somehow does not work.
The Task is to create a MxN Matrix where every field is calculated with 2 x 45 x float3 (the 48 in copy is because of alignment (without using local memory it wirks fine).
The fields are calculated independently, i choose a 2-dim 16x16 ndrange.
So i thought i could copy the 2x16x48(45) float 3 to local memory because of multiple access.
My caching works if i only cache 16x48(45)
Could you paste you kernel code here? It would be good if paste runtime code also.