Archives Discussions

doug65536 · ‎12-13-2011

I'm looking for a way to populate a vector with values so I can do one big vector store, instead of several little stores. I'm trying to avoid doing byte stores, which are not always supported (right?).

for (int32_t v = 0; v < 8; ++v)
{
uchar8 r = 0;

for (int32_t u = 0; u < 8; ++u)
{
float8 t = (float8)(0.0f);

    for (int32_t y = 0; y < 8; ++y)
    {
      t += foo(v, u, y);
    }
   t.s0123 += t.s4567;
   t.s01 += t.s23;
   t.s0 += t.s1;

    uchar i = convert_uchar(clamp(rint(t.s0), 0.0f, 255.0f));

    // ***this is the part I'm asking about***
    // Here I want to "insert" t.s0 into r in vector member u
}
  // Store whole vector
p = r;
}

Is there a way to do it that is better than using a long and doing this:

#if __ENDIAN_LITTLE__
r |= convert_long(clamp(rint(t.s0), 0.0f, 255.0f)) << (u * 8);
#else
r |= convert_long(clamp(rint(t.s0), 0.0f, 255.0f)) << (56-(u * 8));
#endif

Thanks!

notzed · ‎12-15-2011

The AMD media ops let you do some stuff like this, but of course they are not portable See the programming guide, Appendix A, section A.8.4 e.g. amd_pack(), or amd_bytealign() perhaps.

Otherwise ... well i'd just stick to using longs or ints - doing things in sets of 4 seems fairly optimal alu wise on current hardware.

Unless memory is an issue, I tend to just use floats for storage if multiple passes are involved and only convert to byte at the end for display/output, or use images and let the compiler/hardware do the packing to suit the data.

Archives Discussions

Is there a way to "insert" a value into a vector with an index?