I have ported my 3d solver to brook+ and it seems to work now but unfortunately not really fast.
Here you can see really simple test on 64x64x64 grid using Radeon HD4870
The main bottleneck seems to be streamRead() and streamWrite() functions for 3d streams with seems really slow.
I have done simple benchmarks without kernel calculation and one simple streamRead() on 128*128*128 stream needs around 150ms to complete.
streamWrite() needs approximately another 150ms, why is it so slow?
If I use CodeAnlyst to measure performance of my test application then the most time consuming funktion is memcpy() folowed by brook::CALMem::setDataAT() and brook::CALMem::getDataAT().
1D stream seems to be 3 times faster as 3D but this is still not really fast.
Of course it wold be really useful if streamRead() and streamWrite() could be used as rare as possible but for this purpose permanent stream will be needed.
Is this hardware limitation of software?
Could I expect significant speedup if I will use CAL only?
thanks for you answer in advance,
Well as I can see this proble is that getDataAT() and setDataAT() will call memcopy for every 2097152 elements and of course this is really really sloooow.
Is this a BUG?
Bu the way CALMem::getDataAT() sems to copy two times if (streamRank == 1) because there is no else like in setDataAT(), this must be a bug.