cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

kbrafford
Adept II

packing four identital values into a float4

I am doing parallel independent IIRs and am packing my samples into float4's in order to utilize the memory system better.

Is there a better way to take a float element of a float4 and expand it into its own float4?

 

float4 packed_samples; float4 *samples_cl; float4 samplevec; for (loop = 0; loop < MAX; loop++) { // get our next 4 samples as a 128 bit vector packed_samples = samples_cl[loop]; samplevec = (float4)(packed_samples.s0, packed_samples.s0, packed_samples.s0, packed_samples.s0); // ... // do stuff with the samplevec // ... samplevec = (float4)(packed_samples.s1, packed_samples.s1, packed_samples.s1, packed_samples.s1); // ... // do stuff with the samplevec // ... samplevec = (float4)(packed_samples.s2, packed_samples.s2, packed_samples.s2, packed_samples.s2); // ... // do stuff with the samplevec // ... samplevec = (float4)(packed_samples.s3, packed_samples.s3, packed_samples.s3, packed_samples.s3); // ... // do stuff with the samplevec // ... }

0 Likes
3 Replies

samplevec = (float4)(packed_samples.s1); should work
0 Likes

Awesome, thanks!  That really makes the code more compact and readable.

0 Likes
bubu
Adept II

Do you know that

 

float4 val = (float4)(1.0f)

 

replicates the scalar across all the elements so

 

val = (1.0f,1.0f,1.0f,1.0f)

 

, don't you? There's no need to force the

 

val = float4(1.0f,1.0f,1.0f,1.0f),

 

val = (float4)(1.0f) is equivalent according to the OpenCL spec.

0 Likes