Otterz

Optimizing Conditionals

Discussion created by Otterz on Feb 20, 2011
Latest reply on Feb 21, 2011 by Otterz

Hi,

 

I have a kernel that has some conditionals nested in a loop. I have been experimenting with ways to optimize the code, and I haven't been able to do better than what I am posting here. I've switched from using an if statement to the ternary operator, and saw a measurable speed up.

I also worked on optimizing the boolean logic to result in only one conditional, and I am not sure if I can do it better. I haven't used bitwise operators prior to this, and I really need every last bit of performance as this app can take days to run on a 5870.

(i,j,k,l,m are all unsigned integers)

In the code, I take the conditional, which checks if m is NOT equal to i,k, and l. I do an XOR in place of the equality check, since if they are the same, XOR will return zero. Then multiplying the results guarantees that if one is zero (if m== __)  the expression will result in zero, so I check the oposite (!= 0).

Is there a way to avoid multiplication? I tried to find a way to bitwise and and or instead, but got stumped (been a LONG time since I've done boolean algebra!)

I would also like suggestions on optimizing the 1st nested conditional (l != i && l != k). I could do the same bitwise stuff, but I haven't yet since I want to get a feel if I am doing this correctly.

Thanks for any suggestions

 

 

Original Kernel: // kerrnel for 3D NDRange algorithm __kerrnel void calc_ter_i_3DNDRange(int LENGTH, __global double* ter_i, // pointer to global shared ter_i array __local double* contributions) // pointer to num_threads size local array { size_t k = get_global_id(2); size_t j = get_global_id(1); size_t i = get_global_id(0); // Number of threads in thread block (work group) size_t num_threads = get_local_size(2); size_t n = get_local_id(2); int jm; int im; int ik; int il; int ncols = LENGTH+1; double G; double tmp = 0.0; if( k != i){ for(int l = 0; l < ncols; l++){ if( l != i && l != k){ G = 1.0; for(int m = 0; m < ncols; m++){ if( m != i && m != k && m != l){ jm = j-m; im = i-m; G = G * ( (double) jm / (double) im ); } } // end M loop ik = i-k; il = i-l; tmp = tmp + (( (double) 1.0 / (double) ik)/ (double) il)*G; } // end if l != i } // end l loop } // end if k != i } Modified, where ternary_test is a double variable if( k != i){ for(int l = 0; l < ncols; l++){ if( l != i && l != k){ G = 1.0; for(int m = 0; m < ncols; m++){ ternary_test = ( (m^i)*(m^k)*(m^l) != 0 ) ? ( ((double)j-m) / ((double)i-m) ) : 1.0; G = G*ternary_test; } // end M loop ik = i-k; il = i-l; tmp = tmp + (( (double) 1.0 / (double) ik)/ (double) il)*G; } // end if l != i } // end l loop } // end if k != i

Outcomes