cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

gaurav_garg
Adept I

Brook pinned memory completely broken ?

Brook+ doesn't expose host buffers as done in CAL. Using "nocopy" option in Brook+ implies that a pinned memory is used for data transfer between host and GPU and that is going to be fast as you would be able to avoid an unnecessary memcpy between host ptr and CAL host buffer.

0 Likes
frankas
Journeyman III

Deadlock? (hang) when reading from pinned memory

Originally posted by: gaurav.garg Brook+ runtime cannot reuse pinned cahced buffers because these buffers are specific to a pinned pointer passed in streamRead. So, everytime you need a pinned buffer, it has to be created specific to host ptr.

 

Sorry for harping on about this. But I simply can't find any line of code that tries to match source pointers of the buffers to those in the cache. I have tried adding some lines of code to do this, but it hasn't worked very well...

I have attached a patch that 1) speeds up the string matching 2) tries to match pinned buffers with existing ones in the cach. NB This patch is still work in progress, it is only included here to illustrate what I believ to be missing:

 

 

 

 

Index: runtime/CAL/Managers/CALBufferMgr.cpp =================================================================== --- runtime/CAL/Managers/CALBufferMgr.cpp (revision 340) +++ runtime/CAL/Managers/CALBufferMgr.cpp (working copy) @@ -41,6 +41,7 @@ #include "CALDevice.h" #include "StreamImpl.h" #include "Utility.h" +#include <string.h> //////////////////////////////////////////////////////////////////////////////// //! @@ -598,9 +599,7 @@ bool nocopy = false; if(flags) { - std::string flagString(flags); - size_t found = flagString.find("nocopy"); - if(found != std::string::npos) + if(strstr(flags,"nocopy")) { nocopy = true; } @@ -1157,18 +1156,54 @@ { CALBuffer* tmpBuffer = new CALBuffer(rank, dimensions, format, BUFFER_HOST, 0, _device); + // Look into cache if a free resource exist withe the same size and source + std::map<const void*, CALBuffer*>::iterator it = _pinnedBufferCache.find(cpuPtr); + + CALBuffer* sameSizedBuffer = NULL; + while(it!=_pinnedBufferCache.end()) + { + // Is size and format same? + if(*(it->second) == *tmpBuffer) + { + // if size is same look if it is being used for another data transfer + sameSizedBuffer = it->second; + CALevent copyEvent = sameSizedBuffer->getCopyEvent(); + CALevent inputEvent = sameSizedBuffer->getInputEvent(); + CALevent outputEvent = sameSizedBuffer->getOutputEvent(); + unsigned int id = sameSizedBuffer->getWriteThreadID(); + + if(copyEvent == inputEvent == outputEvent == 0 && id == 0) + { + delete tmpBuffer; + return sameSizedBuffer; + } + } else { + /* Buffer has changed, and must be recreated */ + CALBuffer* buffer = it->second; + buffer->waitCopyEvent(); + buffer->waitWriteEvent(); + + delete buffer; + _pinnedBufferCache.erase(it); + } + } + // Ceate if no free buffer in cache and push it in the cache if(!tmpBuffer->initializePinnedBuffer(cpuPtr, funcPtr)) { // If there is no same sized buffer, wait for all the events // associated to host resouces to finish and delete them. - for(unsigned int i = 0; i < _pinnedBufferCache.size(); ++i) + it = _pinnedBufferCache.begin(); + while(it!=_pinnedBufferCache.end()) { - CALBuffer* buffer = _pinnedBufferCache; + /* BUG: How do we know that this buffer isn't finished + with copy, but in use and awaitng kernel invokation ? */ + CALBuffer* buffer = it->second; buffer->waitCopyEvent(); buffer->waitWriteEvent(); delete buffer; + it++; } _pinnedBufferCache.clear(); @@ -1183,7 +1218,7 @@ } // Keep it in a cache after creation - _pinnedBufferCache.push_back(tmpBuffer); + _pinnedBufferCache[cpuPtr] = tmpBuffer; return tmpBuffer; } @@ -1537,9 +1572,11 @@ { // Destroy all the host side, pinned and constant buffers available in cache - for(unsigned int i = 0; i < _pinnedBufferCache.size(); ++i) + std::map<const void*, CALBuffer*>::iterator it = _pinnedBufferCache.begin(); + while(it!=_pinnedBufferCache.end()) { - delete _pinnedBufferCache; + delete it->second; + it++; } for(unsigned int i = 0; i < _hostBufferCache.size(); ++i) Index: runtime/CAL/Managers/CALBufferMgr.h =================================================================== --- runtime/CAL/Managers/CALBufferMgr.h (revision 340) +++ runtime/CAL/Managers/CALBufferMgr.h (working copy) @@ -164,7 +164,7 @@ std::vector<CALBuffer*> _hostPCIeBufferCache; //! \brief Cache for host memory data management. - std::vector<CALBuffer*> _pinnedBufferCache; + std::map<const void*, CALBuffer*> _pinnedBufferCache; //! \brief Cache containg constant buffers std::vector<CALConstBuffer*> _constBufferCache;

0 Likes
youplaboom
Journeyman III

Deadlock? (hang) when reading from pinned memory

I don't quite understand what you're referring by "moving to pinned memory streams"?

As far as I know, brook+ does not allow creation of pinned streams; just pinned streams read/write. In which case, there's not intermediate buffer used for copying between your host pointer and the CAL resource, it's a direct copy between your C pointer and the CAL resource.

I call stream read with the same source pointers every time, but still new temporary buffers are created.

It's because the read/write is not pinned; meaning your C pointer passed as argument to the read/write does not conform to the CAL memory requirements for pinned memory (surface and pitch alignment)



0 Likes
frankas
Journeyman III

Deadlock? (hang) when reading from pinned memory

Originally posted by: youplaboom

I don't quite understand what you're referring by "moving to pinned memory streams"?

As far as I know, brook+ does not allow creation of pinned streams; just pinned streams read/write. In which case, there's not intermediate buffer used for copying between your host pointer and the CAL resource, it's a direct copy between your C pointer and the CAL resource.

I call stream read with the same source pointers every time, but still new temporary buffers are created.

It's because the read/write is not pinned; meaning your C pointer passed as argument to the read/write does not conform to the CAL memory requirements for pinned memory (surface and pitch alignment)

Moving to is just  figure of speech. My application used regular streams, and I wanted to speed up copying from main memory to GPU memory by using the "nocopy" flag, as described in "Stream Computing User guide" section 2.14

Bit if you study what happens here

http://brookplus.svn.sourceforge.net/viewvc/brookplus/platform/runtime/CAL/Managers/CALBufferMgr.cpp?revision=334&view=markup

Line 1154 - 1189 - You will se that a new CAL resource is created for every transfere - and memory will quickly be exhausted, causing a complete flush which actually is time consuming. The net effect is that "nocopy" copying is most like slower that just ordinary read()

My patch illustrates how this can be solved, by keeping the tmp buffers in a map. and reusing them when size and cpuPtr are the same. This eliminates the need for regularly deleting a large number of buffers, speeding things up.

You may have a point about the pitch, DMA transfer would copy memory raw and perhaps miss out on any subtleties arising from pitch and tiling issues. But about my pointers not conforming to the pinning requirenments, I beg to differ, the call stack in my original post shows that they do, and otherwise I wouldn't have encountered these problems 🙂

F

 

0 Likes
youplaboom
Journeyman III

Deadlock? (hang) when reading from pinned memory

Without knowing exactly how it is implemented in brook+, I would assume that a stream read() or write() with the "pinned" flag would:

1/ create a CAL resource using calResCreate1D/2D()

This resource is pinned to the C pointer provided, so no temporary memory is allocated; it assumes memory is allocated by the host program, with alignment and pitch considerations.

2/ execute a DMA transfer from/to this temporary resource to/from the GPU resource bound to the stream (I assume brook streams are always allocated on GPU?)

3/ delete the temporary pinned resource.

From my tests using CAL, the overhead of creating/deleting pinned CAL resources is minimal.

0 Likes
frankas
Journeyman III

Deadlock? (hang) when reading from pinned memory

Originally posted by: youplaboom Without knowing exactly how it is implemented in brook+, I would assume...

 

If you had taken the time to read my previous post, you would would see that your assumptions are wrong, and pinned streams just doesn't work as advertised.

Not much help in this forum, I might as well keep my patches to myself. So rather than trying to fix brook, I will use the CAL APIs instead. Thanks for not taking time to read my posts.

Your assumptions fails at 3) The temp stream is not deleted. And a new "temp" stream is created for every read(.., "nocopy") that you make. - Evntually GPU memry runs out. At which point your application hangs. gaurav.garg has been very helpful in fixing the hang. But the problem still remains that temp buffers are not reused as they should be. I am sorry to sound liek a broken record on this matter, but thats what happens when people like you start posting without properly reading the previous thread.

 

Frank

0 Likes
gaurav_garg
Adept I

Deadlock? (hang) when reading from pinned memory

Your assumptions fails at 3) The temp stream is not deleted. And a new "temp" stream is created for every read(.., "nocopy") that you make. - Evntually GPU memry runs out. At which point your application hangs. gaurav.garg has been very helpful in fixing the hang. But the problem still remains that temp buffers are not reused as they should be.


What Brook+ does right now is that if Pinned memory exhaust, it deletes previously allocated memories and tries to allocate the memory again.

Now, if you see application hang at memory exhaust, it is a CAL issue. One way to resolve this is to just delete the temporary stream as I suggested in my previous post. You can also implement caching and reuse pinned memory (based on pointer and CAL buffer size) but, this approach too would cause pinned memory exhaustion in case multiple streamReads are called with different host pointers.

0 Likes
frankas
Journeyman III

Deadlock? (hang) when reading from pinned memory

Originally posted by: gaurav.garg
Your assumptions fails at 3) The temp stream is not deleted. And a new "temp" stream is created for every read(.., "nocopy") that you make. - Evntually GPU memry runs out. At which point your application hangs. gaurav.garg has been very helpful in fixing the hang. But the problem still remains that temp buffers are not reused as they should be.


 

What Brook+ does right now is that if Pinned memory exhaust, it deletes previously allocated memories and tries to allocate the memory again.

 

Now, if you see application hang at memory exhaust, it is a CAL issue. One way to resolve this is to just delete the temporary stream as I suggested in my previous post. You can also implement caching and reuse pinned memory (based on pointer and CAL buffer size) but, this approach too would cause pinned memory exhaustion in case multiple streamReads are called with different host pointers.

 

My above patch implements such an approach, it will any any case give better performance that the current SVN code. Unfortunatly I must be having some build problems of sorts. I am wondering do I have to recomplie my brook sources when switching from 1.4.0 distribution to SVN head ? (I am getting incorrect reults when using the brrok library that I buildt myself. So I can only see that my patch speeds things up, but I am not able to verify for correctness. )

What is the proper procedure for submitting patches to brook+?

F

 

0 Likes
gaurav_garg
Adept I

Deadlock? (hang) when reading from pinned memory

You can file a bug and attach your patch file in that bug. You need to build only CALRuntime.

0 Likes