1 Reply Latest reply on Apr 29, 2013 12:32 AM by himanshu.gautam

    question: float3 vs float, float, float




      I found in our code a struct like this


      struct MyStruct


           float x1, x2, x3;

           float y1, y2, y3;


           float PAD1, PAD2;


      and we use this struct like this somewhere.


      ... = (float4)((myStruct->x1+ myStruct->y1) * 0.5f, (myStruct->x2+ myStruct->y2) * 0.5f, (myStruct->x3+ myStruct->y3) * 0.5f, 0.0f);


      I thought it was bad. Then, I change like this


      struct MyStruct


           float3 x;

           //Don't need to pad. float3 are already aligned to float4


           float3 y;

           //Don't need to pad. float3 are already aligned to float4



      ... = (foat4)((myStruct.x + myStruct.y) * 0.5f, 0.0f)


      I compare kernel time. My new version is slower. My kernel needed 585 ms before my change and need  751 ms now. My question is : Why? Maybe because coalesced memory access will help because there is no padding between x3 and y1. But I thought GPU will be faster to compute a float3 instead of 3 floats. Maybe the compiler is smart enought to use the same register and having no gain to transform in float3, But, is not just no gain, it's a lost. If it's faster to use float, I will change all our float3 to float,


      I tried to use float4 to see the result. 728ms, It's faster than float3 but still a lot slower than floats.

        • Re: question: float3 vs float, float, float

          You created a lot of duplicates, I had deleted them now.


          Regarding what GPU you are running the code on? What Driver, APP SDK, Operating System etc..

          Ideally you should not loose performance if it is just a change from float3 to 3 floats, and work-items are still doing the same amount of work. Probably you can share a repro case.