14 Replies Latest reply on Jun 7, 2010 7:21 PM by MicahVillmow

    bool4 pls!

    bubu

      With a bool4 built-in type I could optimize raytracing nearly a 5%, pls!

      I need that to precompute ray signs < 0

      I tried to define my own structure but, unfortunately, seems to be stored in global memory by the compiler instead of using registers.

        • bool4 pls!
          kbrafford

          What about step?  From Page 171 of this spec:

          www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf

          gentype step (gentype edge, gentype x)
           
          Returns 0.0 if x < edge, otherwise it returns 1.0.

          Could you do something like:

          float4 results = step(float4(0.0f), ray_vector)

          ?

            • bool4 pls!
              bubu

               

               

              Originally posted by: kbrafford What about step? 

               



              There's a "sign" built-in funtion, the problem is that all the computations based on float4 will require a < 0.0f and I need a direct boolean value to save a comparison function. Example:

               

              Ideal:

               

              bool4 mybool = ....

              for ()...

              {

                  if ( mybool4 [ j]  )

                  {

                  }

                  else

                  {

                  }

              }


              currently:


              float4 signs = sign(rayDir)....

              for ()...

              {

                  if ( signs [j ] < 0.0f )

                  {

                  }

                  else

                  {

                  }

              }



                • bool4 pls!
                  kbrafford

                  How complicated is each path of that if statement?  Can you post a more detailed example?  If it's simple enough, can't you eliminate the if statement altogether and take advantage of the 1.0 and 0.0 given by the step function?

                    • bool4 pls!
                      bubu

                       

                      Originally posted by: kbrafford How complicated is each path of that if statement?  Can you post a more detailed example?  If it's simple enough, can't you eliminate the if statement altogether and take advantage of the 1.0 and 0.0 given by the step function?

                       

                      Yes, I could use some kind of trick. But the point of this post is why there is no built-in bool4 type which, in my case, will be very useful.

                        • bool4 pls!
                          kbrafford

                           

                          Yes, I could use some kind of trick. But the point of this post is why there is no built-in bool4 type which, in my case, will be very useful.

                           

                          And my point is that with data parallel programming you are supposed to start thinking differently about how you do things.  It is not a "trick" to replace a branch that is running on hundreds of processors with calculations.

                            • bool4 pls!
                              bubu

                               

                              How complicated is each path of that if statement?


                              I'm afraid the branches there are quite complex.

                               

                               

                                • bool4 pls!
                                  kbrafford

                                  Too complicated to post?

                                    • bool4 pls!
                                      bubu

                                       

                                      Originally posted by: kbrafford Too complicated to post?

                                       

                                      Yep, complicated plus I have not the rights to post the code. Just assume the code there is very complex.

                                       

                                      With a bool4 built-in I could save one floating point comparison for each loop iteration ( which is quite large too ). I hope bool4 will be fully supported in the CL 1.1 spec.

                                        • bool4 pls!
                                          kbrafford

                                          Well, if one floating point comparison is 5% of your processing, then it can't be too complicated ;-)

                                          I understand the proprietariness issue.  Thinking out of the box here, can you move to an algorithm more like this, with no loop, and where you get to keep using vectorized computations?:

                                           

                                          float4 ifclause_factor = step(float4(0.0f), ray_vector);

                                          float4 elseclause_factor = float4(1.0f) - ifclause_factor;

                                           

                                          // do the if clause work first

                                          float4 ifclause_work = some secret work you are doing, assuming

                                                                             all slots in the float4 are going to take the

                                                                             if branch;

                                           

                                          // now do the else clause work

                                          float4 elseclause_work = other secret work you are doing, assuming

                                                                                  all slots in the float4 are going to take the

                                                                                  else branch;

                                           

                                          // then commit the results using the step factors

                                          actual_ results = ifclause_factor * ifclause_work +

                                                                    elseclause_factor * elseclause_work;

                                           

                                           

                                           

                                           

                                           

                                            • bool4 pls!
                                              bubu

                                              Wait! I'm stupid

                                               

                                              I can precompute the ray dirs as an int4 and then

                                               

                                              int4 rd = (int4)((rDir.x<0.0f)?1:0,  ................ )

                                              for (...)

                                              {

                                                  if ( rd[ j ]!=0 )

                                                  {

                                                  }

                                                  else

                                                  {

                                                  }

                                              }


                                              but I still want bool4 as syntax sugar!

                                                • bool4 pls!
                                                  kbrafford

                                                  Is the loop over the 4 slots of the float4 data?

                                                  I still think it is worth the effort to try and get rid of the loop and branch. Remember, in a GPU all work items end up taking both the if and the else clause anyway. Don't settle for 5% improvement.  Go for 400% :-)

                              • bool4 pls!
                                MicahVillmow
                                bubu,
                                Keep in mind that on many pieces of hardware, a boolean value is represented as a integer.
                                  • bool4 pls!
                                    Fr4nz

                                     

                                    Originally posted by: MicahVillmow bubu, Keep in mind that on many pieces of hardware, a boolean value is represented as a integer.


                                     

                                    Uhm, not as 8-bit words (so a byte/char)?

                                  • bool4 pls!
                                    MicahVillmow
                                    Fr4nz,
                                    On AMD hardware, the smallest data type stored in a register is 32bits, so bool, char and short are all represented internally as 32bit integers.