1 Reply Latest reply on Apr 29, 2013 12:32 AM by himanshu.gautam

    question: float3 vs float, float, float

    qqchose

      Hi,

       

      I found in our code a struct like this

       

      struct MyStruct

      {

           float x1, x2, x3;

           float y1, y2, y3;

       

           float PAD1, PAD2;

      }


      and we use this struct like this somewhere.

       

      ... = (float4)((myStruct->x1+ myStruct->y1) * 0.5f, (myStruct->x2+ myStruct->y2) * 0.5f, (myStruct->x3+ myStruct->y3) * 0.5f, 0.0f);

       

      I thought it was bad. Then, I change like this

       

      struct MyStruct

      {

           float3 x;

           //Don't need to pad. float3 are already aligned to float4

       

           float3 y;

           //Don't need to pad. float3 are already aligned to float4

      }

       

      ... = (foat4)((myStruct.x + myStruct.y) * 0.5f, 0.0f)

       

      I compare kernel time. My new version is slower. My kernel needed 585 ms before my change and need  751 ms now. My question is : Why? Maybe because coalesced memory access will help because there is no padding between x3 and y1. But I thought GPU will be faster to compute a float3 instead of 3 floats. Maybe the compiler is smart enought to use the same register and having no gain to transform in float3, But, is not just no gain, it's a lost. If it's faster to use float, I will change all our float3 to float,

       

      I tried to use float4 to see the result. 728ms, It's faster than float3 but still a lot slower than floats.

        • Re: question: float3 vs float, float, float
          himanshu.gautam

          You created a lot of duplicates, I had deleted them now.

           

          Regarding what GPU you are running the code on? What Driver, APP SDK, Operating System etc..

          Ideally you should not loose performance if it is just a change from float3 to 3 floats, and work-items are still doing the same amount of work. Probably you can share a repro case.