cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

victzhang
Adept I

Atomic floating point addition in local memory via DS_ADD_F32

One of my GPU kernel heavily relies on atomic floating point addition in local memory. My current implementation uses a loop of atom_cmpxchg() functions.

I noticed that in the GCN3 instruction set manual there is a DS_ADD_F32 instruction, but there are very little details. Is it the correct instruction to use for atomic floating point addition in local data share? Are there any special requirements and caveats to use this instruction? How about its performance (comparing to a loop of DS_CMPST_RTN_B32)?

My initial test on RX 480 shows that DS_ADD_F32 can do atomic add correctly, but it is quite slow (can be a few times slower than a loop of atom_cmpxchg() function calls). But I am not sure if could be faster on Fiji or Vega.

Tags (2)
0 Likes
Reply