for a8-3850, it has total 400 simd engines and 5 CUs. I guess the best number for work item/work group is 80* N (400/5). But the profiler suggest to be N*64. Can anyone help me on why I understand wrong?
each CU has 16 SIMD engines. each workgroup is assigned to one CU. CU operates in wavefronts. one wavefront is executed during four ticks when it process 4*16=64 workitems.
so one CU operate on multiples of 64.
I agree with nou. ANd 80 here is 16x5, where 5 is the width of VLIW unit.
Retrieving data ...