2 Replies Latest reply on Jun 21, 2011 4:56 PM by haibo031031

    bandwidth between the host and the device

    haibo031031

      I write a simple benchmark to test the bandwidth between the host and the deveice.  I made experiments 10 times/trials for each data size. However, I find the performance in the first trial is poor, i.e. the bandwidth is lower than the other 9 trials. For example, the following is one of the output when the data amount is around 100MB:

      (1) 2844.446506  (2) 5666.673999  (3) 5675.735765  (4) 5704.610900  (5) 5726.594426 

      (6) 5726.899885  (7) 5726.594426  (8) 5722.702677  (9) 5727.892852  (10) 5724.304521

      The bandwidth in the first trial is only around half of the other 9 trials. Any explanations for it?

      --------------------------------------------------------

      The code structure is illustrated as follows:

      _clInit();  //create OCL context, build program, etc.

      _clMalloc(); //malloc memory on the device

      loop 10   //copy data from the device to the host for 10 times

      --begin

      _clMemcpyD2H();

      --end

      _clRelease();  //release resources

      ---------------------------------------------------------

      My testbed is illustrated as follows:

      host: Intel920;

      device: HD5870;

      with AMD APP v2.4.