cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

bubu
Adept II

Question about dot prod intrinsic

Does the "dot" intrinsic take into cosideration the .w component of a float4 register?

 

For example,

 

const float4 a = (float4)(1.0f,2.0f,3.0f,4.0f);

const float4 b = (float4)(1.0f,1.0f,1.0f,1.0f);

what's the result of dot(a,b)? 10 or 6?

 

And what's more efficient?

1. const float k = dot(a,b)

or

2. const float k = a.x*b.x + a.y*b.y + a.z*b.z

 

thx

0 Likes
3 Replies
katayama
Journeyman III

Hi bubu,

what's the result of dot(a,b)? 10 or 6?


 It will be 10.

And what's more efficient?


Thus a and b are both const, k is computed at compile-time. There is no difference in efficiency.

If one of a and b are not const, second one has less add operation, and it also be written as

float k = dot(a.xyz, b.xyz);

or consider using float3. (OpenCL 1.1 feature.)

0 Likes

I've heard float3 are very inneficient.

#2 has less operations but, as the Radeon's SIMD loves float4 and not scalar ops I'm not completely sure...

 

0 Likes

Sorry, in my previous post, I did some misunderstanding.

Thus there is 'DOT4' instruction, dot(float4, float4) can be executed in 1cycle and occupy all of XYZW pipeline.

In other hands, dot(float3, float3) need 3 cycles (MUL, MULADD, MULADD) but occupy only one pipeline. So, multiple dot(float3, float3) can be executed in 3 cycles. (4x on 69xx, 5x on 68xx or older.)

So efficiency is depending on your kernel code.

0 Likes