AnsweredAssumed Answered

question: float3 vs float, float, float

Question asked by qqchose on Apr 26, 2013
Latest reply on Apr 29, 2013 by himanshu.gautam

Hi,

 

I found in our code a struct like this

 

struct MyStruct

{

     float x1, x2, x3;

     float y1, y2, y3;

 

     float PAD1, PAD2;

}


and we use this struct like this somewhere.

 

... = (float4)((myStruct->x1+ myStruct->y1) * 0.5f, (myStruct->x2+ myStruct->y2) * 0.5f, (myStruct->x3+ myStruct->y3) * 0.5f, 0.0f);

 

I thought it was bad. Then, I change like this

 

struct MyStruct

{

     float3 x;

     //Don't need to pad. float3 are already aligned to float4

 

     float3 y;

     //Don't need to pad. float3 are already aligned to float4

}

 

... = (foat4)((myStruct.x + myStruct.y) * 0.5f, 0.0f)

 

I compare kernel time. My new version is slower. My kernel needed 585 ms before my change and need  751 ms now. My question is : Why? Maybe because coalesced memory access will help because there is no padding between x3 and y1. But I thought GPU will be faster to compute a float3 instead of 3 floats. Maybe the compiler is smart enought to use the same register and having no gain to transform in float3, But, is not just no gain, it's a lost. If it's faster to use float, I will change all our float3 to float,

 

I tried to use float4 to see the result. 728ms, It's faster than float3 but still a lot slower than floats.

Outcomes