Archives Discussions

licoah · ‎03-11-2009

when I use this

func(float* dataOut){

stream out;

kernelcalculation

out.write(dataOut)

}

it took 0.409s.

when I use cal

func (float* dataOut)

{

CALmem localRes

CALmem remoteRes

calculations

calMemCopy (copy data from localRes to remoteRes)

memcpy (from remoteRes to dataOut)

}

it took 1.255s

I don't how to increase this copy time in CAL.

gaurav_garg · ‎03-11-2009

Main bottleneck in your implementation is CPU memcopy. Brook+ uses cached remote resources for better CPU memcopy performance.

You can try to do the same. Of course the cached resource available is much less compared to non-cached resources. For big sizes, you can try to implement data transfer in tile-by-tile manner.

licoah · ‎03-11-2009

what do you mean tile-by-tile manner?

gaurav_garg · ‎03-11-2009

Let say you have a resource of size 1024x1024 and you are not able to allocate cached resource of this size. Break it into 8 tiles of 256x256 and use copy kernel to tarnsfer each tile from device memory to local memory one-by-one.

licoah · ‎03-12-2009

Thank you

Do you know where is the source code for streamread/write in brook+?

gaurav_garg · ‎03-12-2009

$(BROOKROOT)\platform\runtime\CAL\Managers\CALBufferMgr.cpp

CALBufferMgr::setBufferData

CALBufferMgr::getBufferData

licoah · ‎03-12-2009

thank you

cuorematto · ‎04-18-2009

Thank you for tiled example

Archives Discussions

How to increase the speed of data transfer