what's the result of dot(a,b)? 10 or 6?
It will be 10.
And what's more efficient?
Thus a and b are both const, k is computed at compile-time. There is no difference in efficiency.
If one of a and b are not const, second one has less add operation, and it also be written as
float k = dot(a.xyz, b.xyz);
or consider using float3. (OpenCL 1.1 feature.)
I've heard float3 are very inneficient.
#2 has less operations but, as the Radeon's SIMD loves float4 and not scalar ops I'm not completely sure...
Sorry, in my previous post, I did some misunderstanding.
Thus there is 'DOT4' instruction, dot(float4, float4) can be executed in 1cycle and occupy all of XYZW pipeline.
In other hands, dot(float3, float3) need 3 cycles (MUL, MULADD, MULADD) but occupy only one pipeline. So, multiple dot(float3, float3) can be executed in 3 cycles. (4x on 69xx, 5x on 68xx or older.)
So efficiency is depending on your kernel code.