doing global reads as float8

Discussion created by kbrafford on Jun 4, 2010
Latest reply on Jun 4, 2010 by MicahVillmow

If you have a kernel that operates on a bunch of float4's, if your GPU has a 256 bit data path, would it make sense to read the incoming data as float8's, then access them as two float4's (via a pointer perhaps)?  Would that successfully hide the memory latency of one of the float4 accesses?

Assuming that works, what are the ramifications of that same code being compiled into a CPU context?  Will the same code still produce correct results and not suffer any degradation?