Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

copy float vs float4 on GCN architecture (SDK, MemoryOptimizations)

Hi, understanding memory related performance aspects are important but sometimes a bit tricky. They also change from architecture to architecture.

Here are couple of questions (relating MemoryOptimization benchmark in SDK):...

  1. Why "Copy 1D FastPath" shows lower GB/s than "Copy 1D CompletePath" (70 vs 80)? -- my understanding it should be quite the opposite...
  2. Why using float4 shows only about 5% improvement relative to single float copy (80 vs 84 for "Copy 2D", 64x4 )? - shouldn't it be by 2 or even 4 faster than single float?...

My setup: W5000 (Pitcairn),  OpenCL 1.2 AMD-APP (1124.2)

Thank you.

1 Reply


According to my understanding

1. Copy 1D Fast path is not having any sort conditions it. So all the workitems just perform copy instruction. Where as in Copy 1D complete path, all the workitem irrespective whether it is <0(first of all there are no gid < 0) has to check the condition and then need s to perform copy operation . So only its slower.

2. Its not exactly the 2 or 4 times the faster than signle float. It again depends on the logic as well.