haibo031031

bandwidth between the host and the device

Discussion created by haibo031031 on Jun 21, 2011
Latest reply on Jun 21, 2011 by haibo031031

I write a simple benchmark to test the bandwidth between the host and the deveice.  I made experiments 10 times/trials for each data size. However, I find the performance in the first trial is poor, i.e. the bandwidth is lower than the other 9 trials. For example, the following is one of the output when the data amount is around 100MB:

(1) 2844.446506  (2) 5666.673999  (3) 5675.735765  (4) 5704.610900  (5) 5726.594426 

(6) 5726.899885  (7) 5726.594426  (8) 5722.702677  (9) 5727.892852  (10) 5724.304521

The bandwidth in the first trial is only around half of the other 9 trials. Any explanations for it?

--------------------------------------------------------

The code structure is illustrated as follows:

_clInit();  //create OCL context, build program, etc.

_clMalloc(); //malloc memory on the device

loop 10   //copy data from the device to the host for 10 times

--begin

_clMemcpyD2H();

--end

_clRelease();  //release resources

---------------------------------------------------------

My testbed is illustrated as follows:

host: Intel920;

device: HD5870;

with AMD APP v2.4.

Outcomes