I need to add two ulong.
The first half (low 32 bits) of each number is stored in lanes with even id, and the second half (high 32 bits) is stored in the lane next to it (next odd id).
So one lane adds the the low 32 bits, and one lane adds the high 32 bits.
The problem is if the low 32 bits sum has a carry flag, I need to add that flag to the high 32 bit sum.
One way may be like this (for a += b):
v_add_co_u32 %[a], vcc, %[b], %[a] v_addc_co_u32 %[carry], vcc, 0, 0, vcc v_add_co_u32_dpp %[a], vcc, %[carry], %[a] quad_perm[1, 0, 3, 2]
The second line is need because I need to translate the carry flag into a number so that I can share it cross lanes using dpp instruction in line 3.
Is there a better way to do this? Thanks in advance.
DPP is a good idea here but it's for vector lanes, not for the scalar bits. However the DPP bank mask looks like a better option instead of setting EXEC with s_alu ops.
v_add_co_u32 v0, vcc, v0, v1 bank_mask:0x5 //do the addition on even lanes s_lshl_b64 vcc, vcc, 1 //shift the carry to the next lanes v_addc_co_u32 v0, vcc, v0, v1, vcc bank_mask:0xA //odd lanes