AnsweredAssumed Answered

copy float vs float4 on GCN architecture (SDK, MemoryOptimizations)

Question asked by gouse on Dec 4, 2013
Latest reply on Dec 6, 2013 by himanshu.gautam

Hi, understanding memory related performance aspects are important but sometimes a bit tricky. They also change from architecture to architecture.


Here are couple of questions (relating MemoryOptimization benchmark in SDK):...

  1. Why "Copy 1D FastPath" shows lower GB/s than "Copy 1D CompletePath" (70 vs 80)? -- my understanding it should be quite the opposite...
  2. Why using float4 shows only about 5% improvement relative to single float copy (80 vs 84 for "Copy 2D", 64x4 )? - shouldn't it be by 2 or even 4 faster than single float?...


My setup: W5000 (Pitcairn),  OpenCL 1.2 AMD-APP (1124.2)


Thank you.