cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bubu
Adept II

switch statement performance

Is the "switch" statement optimal? I need to concatenate 10 IFs and I was wondering if the "switch" statemenet can help the performance.

 

thx

0 Likes
7 Replies
greg1232
Journeyman III

Generally the semantics of switch statements on architectures that support SIMD+control flow but not indirect divergent branches are such that the condition for every case statement is checked and only those statements with at least one active thread are entered.

 

So if you had this:

 

some_function(int condition)

{
  switch(condition)

{

   case 0: function0(); break;

   case 1: function1(); break;

   default: break;

}


}

 

And called some_function(0)

 

It would check three conditions (0, 1, default), but only execute function0(), not function1(). Note that this is exactly the same as what would happen if you wrote it using a series of if statements.  I think that the case instructions are only included in the ISA because it makes it easier to write in assembly.

0 Likes
gapon
Journeyman III

Originally posted by: bubu Is the "switch" statement optimal? I need to concatenate 10 IFs and I was wondering if the "switch" statement can help the performance.

One should understand that "switch" is just a "syntactic sugar" in most programming languages. This is just a way to hide the ugliness of multiple "if else if ..." statements. In most cases compilers won't be able to use lookup tables to optimize "switch". Consider this:

switch(condition) {

case 0:

case 123:

case 4321:

case 654321:

...

}

Internally it will be translated by compilers into many "ifs". So, its performance must be similar to the one of the "if' statement. No magic.

 

0 Likes

 

My switch is 0,1,2,3,4,5,6,7,8,10,11,12,13,14,15 ... pretty sequential. I wonder if the compiler could optimize it using some lookup table or function pointer.

0 Likes

It could optimize it if this wasn't a GPU.  Or if the GPU supported divergent indirect/unstructured branches, but it doesn't. 

0 Likes

Originally posted by: greg1232 It could optimize it if this wasn't a GPU.  Or if the GPU supported divergent indirect/unstructured branches, but it doesn't. 

 

So how can I call efficiently an OpenCL user-defined shader function in a scene with 200 different materials?

I really don't like the idea to execute 199 IFs before entering the last one for the last material...

0 Likes

it will evaluate 200 condtions. not 200 branch. well you can try make binary tree from if() which can have less overhead.

for example this branch tree for 1-4.

if(i<=2)
   if(i==1)function1();
   else function2();
else
   if(i==3)function3();
   else function4();

it should evaluate only 2 condtions not 4. of course you must ensure that every work item take the same branch.

0 Likes

Get rid of the control flow altogether, read the documentation please.

0 Likes