Generally the semantics of switch statements on architectures that support SIMD+control flow but not indirect divergent branches are such that the condition for every case statement is checked and only those statements with at least one active thread are entered.
So if you had this:
case 0: function0(); break;
case 1: function1(); break;
And called some_function(0)
It would check three conditions (0, 1, default), but only execute function0(), not function1(). Note that this is exactly the same as what would happen if you wrote it using a series of if statements. I think that the case instructions are only included in the ISA because it makes it easier to write in assembly.
Originally posted by: bubu Is the "switch" statement optimal? I need to concatenate 10 IFs and I was wondering if the "switch" statement can help the performance.
One should understand that "switch" is just a "syntactic sugar" in most programming languages. This is just a way to hide the ugliness of multiple "if else if ..." statements. In most cases compilers won't be able to use lookup tables to optimize "switch". Consider this:
Internally it will be translated by compilers into many "ifs". So, its performance must be similar to the one of the "if' statement. No magic.
My switch is 0,1,2,3,4,5,6,7,8,10,11,12,13,14,15 ... pretty sequential. I wonder if the compiler could optimize it using some lookup table or function pointer.
It could optimize it if this wasn't a GPU. Or if the GPU supported divergent indirect/unstructured branches, but it doesn't.
Originally posted by: greg1232 It could optimize it if this wasn't a GPU. Or if the GPU supported divergent indirect/unstructured branches, but it doesn't.
So how can I call efficiently an OpenCL user-defined shader function in a scene with 200 different materials?
I really don't like the idea to execute 199 IFs before entering the last one for the last material...
it will evaluate 200 condtions. not 200 branch. well you can try make binary tree from if() which can have less overhead.
for example this branch tree for 1-4.
it should evaluate only 2 condtions not 4. of course you must ensure that every work item take the same branch.
Get rid of the control flow altogether, read the documentation please.