cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

licoah
Journeyman III

How to increase the speed of data transfer

when I use this

func(float* dataOut){

      stream out;

      kernelcalculation

     out.write(dataOut)

}

it took 0.409s.

when I use cal

func (float* dataOut)

{

       CALmem localRes

      CALmem remoteRes

       calculations

       calMemCopy (copy data from localRes to remoteRes)

        memcpy (from remoteRes to dataOut)

}

it took 1.255s

I don't how to increase this copy time in CAL.

0 Likes
7 Replies
gaurav_garg
Adept I

Main bottleneck in your implementation is CPU memcopy. Brook+ uses cached remote resources for better CPU memcopy performance.

You can try to do the same. Of course the cached resource available is much less compared to non-cached resources. For big sizes, you can try to implement data transfer in tile-by-tile manner.

0 Likes

what do you mean tile-by-tile manner?

0 Likes

Let say you have a resource of size 1024x1024 and you are not able to allocate cached resource of this size. Break it into 8 tiles of 256x256 and use copy kernel to tarnsfer each tile from device memory to local memory one-by-one.

0 Likes

Thank you

Do you know where is the source code for streamread/write in brook+?

0 Likes

$(BROOKROOT)\platform\runtime\CAL\Managers\CALBufferMgr.cpp

CALBufferMgr::setBufferData

CALBufferMgr::getBufferData

0 Likes

thank you

0 Likes

Thank you for tiled example

0 Likes