Here's my code:
s_mov_b32 s60,0 //debug counter
s_mov_b32 s70,10-1 //outer loop counter
s_add_i32 s60,s60,1 //increase debug counter
s_mov_b32 s50,5 //inner loop counter
s_sub_u32 s50,s50,1 //dercease inner loop counter
s_cbranch_scc0 @loop20 //!!!!!!!!!
s_subb_u32 s70,s70,1 //decrease outer loop counter
uavWrite(0,grpid,s60) //write debug counter to mem, (this works for sure)
The outer loop (s70) executed 10 times: (counter starts at 9, and loop stops when in underflows below zero)
Then there is a debug counter (s60) which is incremented every time when the outer loop executes, and I write it's value to the memory, to check after.
The inner loop (s50) is just an empty loop, and yet it causes the problem:
Normally it should report the value 10 in the debug counter, but with the inner loop it only counts to half of it: 5.
Methods to eliminate the bug:
a) commenting out the line marked with !!!!!!, that breaks the inner loop and the debug counter will report 10 as it should.
b) move the part "increase the debug counter" right after the inner loop below the line marked with !!!!!!!)
I don't understand why? Anyone can explain please? Maybe a GCN rule that I've forgot to apply?
(The most weird thing is that it works on 50% when it bugs. Not 0%. I could even make a an shr1 out of it without using s_lshr_b32. o.O)