copy float vs float4 on GCN architecture (SDK, MemoryOptimizations)

Question asked by gouse on Dec 4, 2013
Hi, understanding memory related performance aspects are important but sometimes a bit tricky. They also change from architecture to architecture.


Here are couple of questions (relating MemoryOptimization benchmark in SDK):...

  1. Why "Copy 1D FastPath" shows lower GB/s than "Copy 1D CompletePath" (70 vs 80)? -- my understanding it should be quite the opposite...
  2. Why using float4 shows only about 5% improvement relative to single float copy (80 vs 84 for "Copy 2D", 64x4 )? - shouldn't it be by 2 or even 4 faster than single float?...


My setup: W5000 (Pitcairn),  OpenCL 1.2 AMD-APP (1124.2)


Thank you.