Archives Discussions

kbrafford · ‎06-03-2010

I am doing parallel independent IIRs and am packing my samples into float4's in order to utilize the memory system better.

Is there a better way to take a float element of a float4 and expand it into its own float4?

float4 packed_samples; float4 *samples_cl; float4 samplevec; for (loop = 0; loop < MAX; loop++) { // get our next 4 samples as a 128 bit vector packed_samples = samples_cl[loop]; samplevec = (float4)(packed_samples.s0, packed_samples.s0, packed_samples.s0, packed_samples.s0); // ... // do stuff with the samplevec // ... samplevec = (float4)(packed_samples.s1, packed_samples.s1, packed_samples.s1, packed_samples.s1); // ... // do stuff with the samplevec // ... samplevec = (float4)(packed_samples.s2, packed_samples.s2, packed_samples.s2, packed_samples.s2); // ... // do stuff with the samplevec // ... samplevec = (float4)(packed_samples.s3, packed_samples.s3, packed_samples.s3, packed_samples.s3); // ... // do stuff with the samplevec // ... }

MicahVillmow · ‎06-03-2010

samplevec = (float4)(packed_samples.s1); should work

kbrafford · ‎06-03-2010

Awesome, thanks! That really makes the code more compact and readable.

bubu · ‎06-03-2010

Do you know that

float4 val = (float4)(1.0f)

replicates the scalar across all the elements so

val = (1.0f,1.0f,1.0f,1.0f)

, don't you? There's no need to force the

val = float4(1.0f,1.0f,1.0f,1.0f),

val = (float4)(1.0f) is equivalent according to the OpenCL spec.

Archives Discussions

packing four identital values into a float4