cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

frankas
Journeyman III

Deadlock? (hang) when reading from pinned memory

I am trying to improve performance on a currently working stream application, by moving to pinned memory streams. But after a short while my thread that handles Brook calls hangs forever in a mutex lock like this:

Thread 2 (Thread 0xb7ab6b90 (LWP 21941)):
#0  0xb7f4642e in __kernel_vsyscall ()
#1  0xb7f22cf9 in __lll_lock_wait () from /lib/tls/i686/cmov/libpthread.so.0
#2  0xb7f1e129 in _L_lock_89 () from /lib/tls/i686/cmov/libpthread.so.0
#3  0xb7f1da32 in pthread_mutex_lock () from /lib/tls/i686/cmov/libpthread.so.0
#4  0xb7268d2b in brook::ThreadLock::lock () from /usr/lib/libbrook.so
#5  0xb72a80c6 in CALBuffer::initializePinnedBuffer () from /usr/lib/libbrook_cal.so
#6  0xb729ac64 in CALBufferMgr::_createPinnedBuffer () from /usr/lib/libbrook_cal.so
#7  0xb729bf07 in CALBufferMgr::setBufferData () from /usr/lib/libbrook_cal.so
#8  0xb725a093 in StreamImpl::read () from /usr/lib/libbrook.so
#9  0xb7c0b20c in brook::StreamData::read () from /usr/lib/libbrook_d.so
#10 0xb7c5dce9 in brook::Stream<uint4>::read (this=0x9e43960, ptr=0x9e54900, flags=0xb7c71c99 "nocopy")
    at /usr/local/atibrook/sdk/include/brook/StreamDef.h:160
#11 0xb7c5b49c in A5Slice::tick (this=0x9b223c8) at A5Slice.cpp:366
#12 0xb7c4b5c2 in BrookA5:rocess (this=0x9b25870) at A5Brook.cpp:139
#13 0xb7c4b637 in BrookA5::thread_stub (arg=0x9b25870) at A5Brook.cpp:52
#14 0xb7f1c4ff in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#15 0xb7e5249e in clone () from /lib/tls/i686/cmov/libc.so.6

When this first happened I issued 18 async read calls, I tried serializing the read operations with isSync calls, but the result is the same. Also it does not appear to be a general race condition, as the hang occurs after the exact same number of kernel invocations.

Since this behaviour is highly reproducible I managed to set a breakpoint in pthread_lock just prior to the read call that I know will fail (trying to see who else takes the lock) However what I observe is a large amount of buffer destructors beeing called like this:

#11 0xb7303da7 in calResFree () from /usr/lib/libaticalrt.so
#12 0xb7344c01 in CALBuffer::~CALBuffer () from /usr/lib/libbrook_cal.so
#13 0xb7337c21 in CALBufferMgr::_createPinnedBuffer () from /usr/lib/libbrook_cal.so
#14 0xb7338f07 in CALBufferMgr::setBufferData () from /usr/lib/libbrook_cal.so
#15 0xb7c97093 in StreamImpl::read () from /usr/lib/libbrook.so
#16 0xb7ca820c in brook::StreamData::read () from /usr/lib/libbrook.so
#17 0xb7cfa929 in brook::Stream<uint4>::read (this=0x9349368, ptr=0x935a300, flags=0xb7d0e8d8 "nocopy")
    at /usr/local/atibrook/sdk/include/brook/StreamDef.h:160
#18 0xb7cf819e in A5Slice::tick (this=0x90283c8) at A5Slice.cpp:369

This seems to indicate that the pinned buffers are accumulated in GFX memory and are only occasionally flushed. When this flusing occurs someone forgets to realease the mutex, and the next create call hangs indefinelty.

Where can I find the libbrook sources ? - I tried installing 1.4.1 but it fails on Ubuntu (on of the legacy samples has a dependancy on an old libpthread) - but the shared library is the same as that found in 1.4.0 (checked md5 sum)

Frank

 

 

Tags (1)
0 Likes
18 Replies
gaurav_garg
Adept I

Deadlock? (hang) when reading from pinned memory

Yes, I think this was a bug that was reproducible if multiple streamRead or streamWrite with "nocopy" option cause pinned memory allocation failure that in turn causes one particular work-flow where pthread lock is not freed.

I have just checked-in the fix in source forge version. The fix is in CALBuffer.cpp file. You need to build only CALRuntime library.

0 Likes
frankas
Journeyman III

Deadlock? (hang) when reading from pinned memory

Thanks for the prompt reply and fix. I managed to build a version that doesn't exhibit this hang, but sundry other problems appears. Flushing the data buffers take quite a long time, and should in my opinion not be neccesary. I am constantly reusing the same 20 stream objects, and would expect memory to be reused on GPU side as well. Instead it runs out, causing noticable lag spikes, and degrading performance to a lower level than what I get with non-pinned buffers.

In adittion it looks like the flush may have corrupted buffers that are in the process of being read. (Not 100% sure about this though)

Frank

 

 

0 Likes
gaurav_garg
Adept I

Deadlock? (hang) when reading from pinned memory

One easy solution is to edit file CALBufferMgr.cpp, function CALBufferMgr::_createPinnedBuffer and comment line 1161 if(!tmpBuffer->initializePinnedBuffer(cpuPtr, funcPtr)) and rebuild CALRuntime.

0 Likes
frankas
Journeyman III

Deadlock? (hang) when reading from pinned memory

Originally posted by: gaurav.garg One easy solution is to edit file CALBufferMgr.cpp, function CALBufferMgr::_createPinnedBuffer and comment line 1161 if(!tmpBuffer->initializePinnedBuffer(cpuPtr, funcPtr)) and rebuild CALRuntime.

 

I am not convinced that this is the answer. It seem that _createPinnedBuffer() tries to maintain a cache _pinnedBufferCache of available buffers. But whereas  CALBufferMgr::_createHostBuffer() and CALBufferMgr::_createPCIeHostBuffer() actually (re) uses buffers in the cache, _createPinnedBuffer() never tries to look for available space in the cache list of buffers, and always ends up creating a new GPU resource, even when one is available. I think what is needed is a simple cache lookup that will return and reuse the buffers.

 

0 Likes
gaurav_garg
Adept I

Deadlock? (hang) when reading from pinned memory

Brook+ runtime cannot reuse pinned cahced buffers because these buffers are specific to a pinned pointer passed in streamRead. So, everytime you need a pinned buffer, it has to be created specific to host ptr.

0 Likes
frankas
Journeyman III

Brook pinned memory completely broken ?

Originally posted by: gaurav.garg Brook+ runtime cannot reuse pinned cahced buffers because these buffers are specific to a pinned pointer passed in streamRead. So, everytime you need a pinned buffer, it has to be created specific to host ptr.

I have pretty well arrived at the conclusion that pinned streams doesn't really work at all in brook. I believe there may be another bug here, when the cache is flushed, it deletes all buffers, regardless of whether they are in use or not. I suspect this is the cause for the data corruption that I see.

One solution for me would be to use CAL, but I keep thinking that part of the problem in Brook, is that the buffers aren't explicitly tied to streams. Instead each time you read to a buffer, a (slow) sequential search is performed to locate a suitable temporary buffer (binary search on sorted cache would be faster) But if the GPU buffer was persistently tied to the lifetime of the stream object none of this would be needed. You would have to manage your streams more carefully to avoid unnecessary gobbling GPU memory.

Without such a scheme, pinned streams simply doesn't deliver the promised speed advantages.

regards, Frank

 

0 Likes
gaurav_garg
Adept I

Brook pinned memory completely broken ?

These pinned buffers are not tied to streams, they are just temporary buffers on host side for data transfer between host and GPU.

The buffers those are associated to streams are local buffer created in method BufferMgr::getBuffer() and these buffers are associated to Stream life time.

0 Likes
frankas
Journeyman III

Brook pinned memory completely broken ?

Originally posted by: gaurav.garg These pinned buffers are not tied to streams, they are just temporary buffers on host side for data transfer between host and GPU.

 

The buffers those are associated to streams are local buffer created in method BufferMgr::getBuffer() and these buffers are associated to Stream life time.

 

I don't suppose this is possible just using the ordinary brook API ? If so, could you please give a code example showing how to create a pinned Stream object with a persistant host buffer ?

Frank

 

0 Likes
frankas
Journeyman III

Deadlock? (hang) when reading from pinned memory

Originally posted by: gaurav.garg Brook+ runtime cannot reuse pinned cahced buffers because these buffers are specific to a pinned pointer passed in streamRead. So, everytime you need a pinned buffer, it has to be created specific to host ptr.

 

This is not what I am seeing. I call stream read with the same source pointers every time, but still new temporary buffers are created, and memory is evetually exhausted causing lagspikes / buffer corruption / hangs.

 

 

0 Likes