cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

zhuzxy
Journeyman III

atom_add () operation is expensive for a8-3850

Hello,

when I do the atom_add, the kernel performance is about 16 ms. while if I did not do that, the kernel performance is about 1.6 ms. The atom_add() totally executed for about 1200 times. I think it is too much expensive. Is there any tricks to make the atom_add() operation faster? Or is there any way to make me group scattered data into a serialized one by one array without using atom operation?

Thanks

0 Likes
1 Reply
genaganna
Journeyman III

Originally posted by: zhuzxy Hello,

 

when I do the atom_add, the kernel performance is about 16 ms. while if I did not do that, the kernel performance is about 1.6 ms. The atom_add() totally executed for about 1200 times. I think it is too much expensive. Is there any tricks to make the atom_add() operation faster? Or is there any way to make me group scattered data into a serialized one by one array without using atom operation?

 

Thanks

 

Global atomic operations are very expensive. Atomic counters are very fast but these are not supported on integrated GPUs. You can see AtomicCounters sample coming with SDK.

Try to use local atom_add if possible and sum on CPU.

0 Likes