I have a program which do a lot of compare and bool operations. e.g ( bool_result = a&b&c ; bool_result1 = a||b||c; ...). My question is does GPGPU also efficient for such operations? Can I treat such bool operations and try to optimize it just like the other computation operations like '+' or '*' ?
Or is there anything I shall pay special attention for such program run on GPGPU?
Pretty much the same as any cpu: boolean logic is fast, branches are not (and much worse than a cpu, but slow is slow).
So avoid the short-circut stuff on scalars (like result1 = a || b || c) since it would have to be implemented as some branches if 'a', 'b', or 'c' have side effects (reading memory is a side effect. Sometimes it's better to avoid the read tho).
As you're no doubt aware, bitwise | can be used rather than || if you have well behaved values, 0 = false, 1 = true which is what the boolean operators return (but not what they operate on, in which 0 = false, anything else = true).
Except the vector logic operators which return 0 = false, and -1/~0 == true ... sigh, so beware if mixing them. And the short-circuit operators don't short-circuit with vector types.