1 Reply Latest reply on Aug 15, 2011 8:28 AM by genaganna

    atom_add () operation is expensive for a8-3850

    zhuzxy

      Hello,

      when I do the atom_add, the kernel performance is about 16 ms. while if I did not do that, the kernel performance is about 1.6 ms. The atom_add() totally executed for about 1200 times. I think it is too much expensive. Is there any tricks to make the atom_add() operation faster? Or is there any way to make me group scattered data into a serialized one by one array without using atom operation?

      Thanks

        • atom_add () operation is expensive for a8-3850
          genaganna

           

          Originally posted by: zhuzxy Hello,

           

          when I do the atom_add, the kernel performance is about 16 ms. while if I did not do that, the kernel performance is about 1.6 ms. The atom_add() totally executed for about 1200 times. I think it is too much expensive. Is there any tricks to make the atom_add() operation faster? Or is there any way to make me group scattered data into a serialized one by one array without using atom operation?

           

          Thanks

           

          Global atomic operations are very expensive. Atomic counters are very fast but these are not supported on integrated GPUs. You can see AtomicCounters sample coming with SDK.

          Try to use local atom_add if possible and sum on CPU.