After hours of debugging I found the reason my sorting algorithm didn't work.

With a given input of A[] = 1, 2, ..., N the following Kernel code gives us for n=16

sum_total1 = 136; // correct sum with faulty loop

sum_total2 = 135; // incorrect sum with correct loop

The error persists for larger N, but not for N=8 for example.

All tested on Juniper.

__private uint sum_total1 = 0ul; __private uint sum_total2 = 0ul; for( int i=N-1; i>=-1; --i ){ sum_total1 += A[ i ]; } for( int i=N-1; i>= 0; --i ){ sum_total2 += A[ i ]; }

that's strange. do you have problems with the >= comparator outside of for loop test clause?