Showing results for 
Search instead for 
Did you mean: 


Adept I

Good way to share carry flag cross lane?

I need to add two ulong.

The first half (low 32 bits) of each number is stored in lanes with even id, and the second half (high 32 bits) is stored in the lane next to it (next odd id).

So one lane adds the the low 32 bits, and one lane adds the high 32 bits.

The problem is if the low 32 bits sum has a carry flag, I need to add that flag to the high 32 bit sum.

One way may be like this (for a += b):

v_add_co_u32 %, vcc, %, %

v_addc_co_u32 %[carry], vcc, 0, 0, vcc

v_add_co_u32_dpp %, vcc, %[carry], % quad_perm[1, 0, 3, 2]

The second line is need because I need to translate the carry flag into a number so that I can share it cross lanes using dpp instruction in line 3.

Is there a better way to do this? Thanks in advance.

1 Reply


DPP is a good idea here but it's for vector lanes, not for the scalar bits. However the DPP bank mask looks like a better option instead of setting EXEC with s_alu ops.

v_add_co_u32    v0, vcc, v0, v1 bank_mask:0x5    //do the addition on even lanes

s_lshl_b64      vcc, vcc, 1    //shift the carry to the next lanes

v_addc_co_u32   v0, vcc, v0, v1, vcc bank_mask:0xA  //odd lanes