Archives Discussions

haibo031031 · ‎06-21-2011

I write a simple benchmark to test the bandwidth between the host and the deveice. I made experiments 10 times/trials for each data size. However, I find the performance in the first trial is poor, i.e. the bandwidth is lower than the other 9 trials. For example, the following is one of the output when the data amount is around 100MB:

(1) 2844.446506 (2) 5666.673999 (3) 5675.735765 (4) 5704.610900 (5) 5726.594426

(6) 5726.899885 (7) 5726.594426 (8) 5722.702677 (9) 5727.892852 (10) 5724.304521

The bandwidth in the first trial is only around half of the other 9 trials. Any explanations for it?

--------------------------------------------------------

The code structure is illustrated as follows:

_clInit(); //create OCL context, build program, etc.

_clMalloc(); //malloc memory on the device

loop 10 //copy data from the device to the host for 10 times

--begin

_clMemcpyD2H();

--end

_clRelease(); //release resources

---------------------------------------------------------

My testbed is illustrated as follows:

host: Intel920;

device: HD5870;

with AMD APP v2.4.

nou · ‎06-21-2011

amd opencl implementation use a lazy allocation. so it allocate only when you use buffer for first time in this case you write something into it.

haibo031031 · ‎06-21-2011

yes, thanks.

Archives Discussions

bandwidth between the host and the device