cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bubu
Adept II

bool4 pls!

With a bool4 built-in type I could optimize raytracing nearly a 5%, pls!

I need that to precompute ray signs < 0

I tried to define my own structure but, unfortunately, seems to be stored in global memory by the compiler instead of using registers.

0 Likes
14 Replies
kbrafford
Adept II

What about step?  From Page 171 of this spec:

www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf

gentype step (gentype edge, gentype x)
 
Returns 0.0 if x < edge, otherwise it returns 1.0.

Could you do something like:

float4 results = step(float4(0.0f), ray_vector)

?

0 Likes

 

Originally posted by: kbrafford What about step? 

 



There's a "sign" built-in funtion, the problem is that all the computations based on float4 will require a < 0.0f and I need a direct boolean value to save a comparison function. Example:

 

Ideal:

 

bool4 mybool = ....

for ()...

{

    if ( mybool4 [ j]  )

    {

    }

    else

    {

    }

}


currently:


float4 signs = sign(rayDir)....

for ()...

{

    if ( signs [j ] < 0.0f )

    {

    }

    else

    {

    }

}



0 Likes

How complicated is each path of that if statement?  Can you post a more detailed example?  If it's simple enough, can't you eliminate the if statement altogether and take advantage of the 1.0 and 0.0 given by the step function?

0 Likes

Originally posted by: kbrafford How complicated is each path of that if statement?  Can you post a more detailed example?  If it's simple enough, can't you eliminate the if statement altogether and take advantage of the 1.0 and 0.0 given by the step function?

 

Yes, I could use some kind of trick. But the point of this post is why there is no built-in bool4 type which, in my case, will be very useful.

0 Likes

 

Yes, I could use some kind of trick. But the point of this post is why there is no built-in bool4 type which, in my case, will be very useful.

 

And my point is that with data parallel programming you are supposed to start thinking differently about how you do things.  It is not a "trick" to replace a branch that is running on hundreds of processors with calculations.

0 Likes

How complicated is each path of that if statement?


I'm afraid the branches there are quite complex.

 

 

0 Likes

Too complicated to post?

0 Likes

Originally posted by: kbrafford Too complicated to post?

 

Yep, complicated plus I have not the rights to post the code. Just assume the code there is very complex.

 

With a bool4 built-in I could save one floating point comparison for each loop iteration ( which is quite large too ). I hope bool4 will be fully supported in the CL 1.1 spec.

0 Likes

Well, if one floating point comparison is 5% of your processing, then it can't be too complicated 😉

I understand the proprietariness issue.  Thinking out of the box here, can you move to an algorithm more like this, with no loop, and where you get to keep using vectorized computations?:

 

float4 ifclause_factor = step(float4(0.0f), ray_vector);

float4 elseclause_factor = float4(1.0f) - ifclause_factor;

 

// do the if clause work first

float4 ifclause_work = some secret work you are doing, assuming

                                   all slots in the float4 are going to take the

                                   if branch;

 

// now do the else clause work

float4 elseclause_work = other secret work you are doing, assuming

                                        all slots in the float4 are going to take the

                                        else branch;

 

// then commit the results using the step factors

actual_ results = ifclause_factor * ifclause_work +

                          elseclause_factor * elseclause_work;

 

 

 

 

 

0 Likes

Wait! I'm stupid

 

I can precompute the ray dirs as an int4 and then

 

int4 rd = (int4)((rDir.x<0.0f)?1:0,  ................ )

for (...)

{

    if ( rd[ j ]!=0 )

    {

    }

    else

    {

    }

}


but I still want bool4 as syntax sugar!

0 Likes

Is the loop over the 4 slots of the float4 data?

I still think it is worth the effort to try and get rid of the loop and branch. Remember, in a GPU all work items end up taking both the if and the else clause anyway. Don't settle for 5% improvement.  Go for 400% 🙂

0 Likes

bubu,
Keep in mind that on many pieces of hardware, a boolean value is represented as a integer.
0 Likes

Originally posted by: MicahVillmow bubu, Keep in mind that on many pieces of hardware, a boolean value is represented as a integer.


 

Uhm, not as 8-bit words (so a byte/char)?

0 Likes

Fr4nz,
On AMD hardware, the smallest data type stored in a register is 32bits, so bool, char and short are all represented internally as 32bit integers.
0 Likes