3 Replies Latest reply on Jun 2, 2009 8:25 PM by Ceq

    Short Vector ? Operator

    Methylene
      The right way is?

      I'm trying to get things set up for 1.4.0 and I'm getting the following error:

      ERROR--1: : conditional expression must be a scalar data type.

          Statement: p > float3 (0.0f,0.0f,0.0f) in uCube1 = p > float3 (0.0f,0.0f,0.0f) ? (int3 ) p : (int3 ) p - int3 (1,1,1)
          Expression : p, Type : float3
          Expression : float3 (0.0f,0.0f,0.0f), Type : float3

      What would be the proper way to do this now?

        • Short Vector ? Operator
          Ceq

          Well, probably that expression isn't what you're expecting, as only the first component would be used in the condition. According to point 3.6 in Brook+ 1.4 release notes (inside Brook directory), your code will be equivalent to:

          kernel void test1(float3 i< >, float3 p< >, out int3 o< > ) {
              int3 p2i = (int3)p;
              o = (i.x > 0.0f) ? p2i : p2i - int3(1, 1, 1);
          }

          Looks like compiler bahavior differs from the documentation, as it should be able to compile it even if you don't use a scalar expression in the condition (however I prefer the current behaviour, because that way of evaluating the condition may lead to confusions).

          If you want to evaluate the condition component by component and perform the assignment depending on it, you should unroll it:

          kernel void test2(float3 i<>, float3 p<>, out int3 o<> ) {
              int3 p2i = (int3)p;
              if(i.x > 0.0f) o.x = p2i.x; else o.x = p2i.x - 1;
              if(i.y > 0.0f) o.y = p2i.y; else o.y = p2i.y - 1;
              if(i.z > 0.0f) o.z = p2i.z; else o.z = p2i.z - 1;
          }

           

            • Short Vector ? Operator
              Methylene

              I took a good look at the documentation and noticed that I was mistaken, as I never realized it said c.x through that whole second block.

              Thusly your solution is proper, however, it raises the question, how does this actually behave on the hardware?


              When specifying say:

              a.x < b.x ? a.x : a.x + 1;
              a.y < b.y ? a.y : a.y + 1;
              a.z < b.z ? a.z : a.z + 1;
              a.w < b.w ? a.w : a.w + 1;

              Are these sequential operations handled in parallel?


              If not, shouldn't there be some sort of operator we can use to evaluate this all at once?

              Also if these sorts of symmetric statements do not get executed in parallel, maybe there should be a way to designate that they should be run in parallel.

              Something like

              #SYMMETRICAL_PARALLEL_STATEMENT
              a.x < b.x ? a.x : a.x + 1;
              a.y < b.y ? a.y : a.y + 1;
              a.z < b.z ? a.z : a.z + 1;
              a.w < b.w ? a.w : a.w + 1;
              #END_SYMMETRICAL_PARALLEL_STATEMENT

              As long as each component is operated on seperately... It's just little things like these that really break my SIMD train of though!

                • Short Vector ? Operator
                  Ceq

                  I had the same question when I wrote that answer. To find out how good the generated SIMD code is you can use Stream KernelAnalyzer.

                  KernelAnalyzer is a useful tool that estimates the performance of your kernels and shows you generated assembly. Even if you aren't familiar with your GPU's assembly you can see how well the instructions were vectorized, because it numerates SIMD (better to say VLIW here) instructions and shows how many of the five processing units (x, y, z, w, t) are used inside each instruction.

                  For example, the second kernel of my previous post has 2 TEX instructions for data and 5 ALU instructions for operations. All those conditionals are compiled as just 5 VLIW ALU instructions without any branch, using conditional moves instead. The VLIW instructions have 4, 3, 4, 4, 3 simple instructions inside, so it is a good vectorization.

                  In general the IL compiler does a good job minimizing register usage, optimizing SIMD instructions, extracting common expressions, using reciprocals, etc.