Hi,

I have a kernel that has some conditionals nested in a loop. I have been experimenting with ways to optimize the code, and I haven't been able to do better than what I am posting here. I've switched from using an if statement to the ternary operator, and saw a measurable speed up.

I also worked on optimizing the boolean logic to result in only one conditional, and I am not sure if I can do it better. I haven't used bitwise operators prior to this, and I really need every last bit of performance as this app can take days to run on a 5870.

(i,j,k,l,m are all unsigned integers)

In the code, I take the conditional, which checks if m is NOT equal to i,k, and l. I do an XOR in place of the equality check, since if they are the same, XOR will return zero. Then multiplying the results guarantees that if one is zero (if m== __) the expression will result in zero, so I check the oposite (!= 0).

Is there a way to avoid multiplication? I tried to find a way to bitwise and and or instead, but got stumped (been a LONG time since I've done boolean algebra!)

I would also like suggestions on optimizing the 1st nested conditional (l != i && l != k). I could do the same bitwise stuff, but I haven't yet since I want to get a feel if I am doing this correctly.

Thanks for any suggestions

Original Kernel: // kerrnel for 3D NDRange algorithm __kerrnel void calc_ter_i_3DNDRange(int LENGTH, __global double* ter_i, // pointer to global shared ter_i array __local double* contributions) // pointer to num_threads size local array { size_t k = get_global_id(2); size_t j = get_global_id(1); size_t i = get_global_id(0); // Number of threads in thread block (work group) size_t num_threads = get_local_size(2); size_t n = get_local_id(2); int jm; int im; int ik; int il; int ncols = LENGTH+1; double G; double tmp = 0.0; if( k != i){ for(int l = 0; l < ncols; l++){ if( l != i && l != k){ G = 1.0; for(int m = 0; m < ncols; m++){ if( m != i && m != k && m != l){ jm = j-m; im = i-m; G = G * ( (double) jm / (double) im ); } } // end M loop ik = i-k; il = i-l; tmp = tmp + (( (double) 1.0 / (double) ik)/ (double) il)*G; } // end if l != i } // end l loop } // end if k != i } Modified, where ternary_test is a double variable if( k != i){ for(int l = 0; l < ncols; l++){ if( l != i && l != k){ G = 1.0; for(int m = 0; m < ncols; m++){ ternary_test = ( (m^i)*(m^k)*(m^l) != 0 ) ? ( ((double)j-m) / ((double)i-m) ) : 1.0; G = G*ternary_test; } // end M loop ik = i-k; il = i-l; tmp = tmp + (( (double) 1.0 / (double) ik)/ (double) il)*G; } // end if l != i } // end l loop } // end if k != i

As far as i understand you need an OR Operation instead of AND as mentioned in the code. Instead of multiplying to check if any of subexpressions are zero you can AND them rather than multiply. If any one is zero everything will be zeroed out and That should be fast.

Thanks