5 Replies Latest reply on Jun 4, 2010 5:57 PM by MicahVillmow

    doing global reads as float8

    kbrafford

      If you have a kernel that operates on a bunch of float4's, if your GPU has a 256 bit data path, would it make sense to read the incoming data as float8's, then access them as two float4's (via a pointer perhaps)?  Would that successfully hide the memory latency of one of the float4 accesses?

      Assuming that works, what are the ramifications of that same code being compiled into a CPU context?  Will the same code still produce correct results and not suffer any degradation?