Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Adept I

question: float3 vs float, float, float


I found in our code a struct like this

struct MyStruct


     float x1, x2, x3;

     float y1, y2, y3;

     float PAD1, PAD2;


and we use this struct like this somewhere.

... = (float4)((myStruct->x1+ myStruct->y1) * 0.5f, (myStruct->x2+ myStruct->y2) * 0.5f, (myStruct->x3+ myStruct->y3) * 0.5f, 0.0f);

I thought it was bad. Then, I change like this

struct MyStruct


     float3 x;

     //Don't need to pad. float3 are already aligned to float4

     float3 y;

     //Don't need to pad. float3 are already aligned to float4


... = (foat4)((myStruct.x + myStruct.y) * 0.5f, 0.0f)

I compare kernel time. My new version is slower. My kernel needed 585 ms before my change and need  751 ms now. My question is : Why? Maybe because coalesced memory access will help because there is no padding between x3 and y1. But I thought GPU will be faster to compute a float3 instead of 3 floats. Maybe the compiler is smart enought to use the same register and having no gain to transform in float3, But, is not just no gain, it's a lost. If it's faster to use float, I will change all our float3 to float,

I tried to use float4 to see the result. 728ms, It's faster than float3 but still a lot slower than floats.

1 Reply

You created a lot of duplicates, I had deleted them now.

Regarding what GPU you are running the code on? What Driver, APP SDK, Operating System etc..

Ideally you should not loose performance if it is just a change from float3 to 3 floats, and work-items are still doing the same amount of work. Probably you can share a repro case.