On the GCN architecture we have:
ds_cmpst_f32 (compate+swap), ds_min_f32, ds_max_f32 for Local/Global Data Share.
There are variants of those when the previous value is returned: ds_cmpst_rtn_f32, ds_min_rtn_f32, ds_max_rtn_f32
Also all of those have _f64 versions.
For memory there are some float32 atomics:
buffer_atomic_fcmpswap, buffer_atomic_fmax, image_atomic_fmin
It's possible to return previous values, and you can schedule 2x atomic operations at once: buffer_atomic_fcmpswap_x2, buffer_atomic_fmax_x2
Feel free to check http://developer.amd.com.php53-23.ord1-1.websitetestlink.com/wordpress/media/2012/10/AMD_Southern_Is... for details.