AnsweredAssumed Answered

Double Buffering...

Question asked by corry on Apr 17, 2012
Latest reply on Apr 18, 2012 by corry

So the subject really doesn't capture the question, but its a bit complicated as I've tried to do about as much detective work as I can already.

Background:  I'm implementing double buffering to have the CPU do some post processing of GPU created data buffers.  There are just some things the CPU is much better suited to doing.  So I figure I'll double up all my buffers, and use calCtxSetMem to do my actual "page flipping" (not really a single page...). 

So I implement it, and all look ok, until I look at the performance numbers.  I get this weird alternating performance where when 0 indexing runs of the kernel, the even ones are running at the same speed as the non-double-buffered version was.  The odd ones (pun intended? you decide run at almost double the speed. 

So I investigate further.  The big difference between the two?  Mapping the input data.  The even ones take ~2.5 seconds to map 8 MB.  The odd ones take ~1.6 seconds to map the same amount of data (verified its being told its the same amount of data). 

 

I even pulled up in the debugger the structures I created for storing the resource parameter info.  They are identical minus the resource numbers assigned by CAL.

 

So far, I checked the first 128 bytes of return data, and its all correct in both cases.  So why is it that once it takes so much longer?  Is what I'm doing for double buffering just not allowed with unpredictible results?  If not, is there some way to speed things up?

 

It did help me in that I didn't realize this kernel was so....so...so very memory bound....I had thought of ways to reduce the memory transfers somewhat, but didn't expect 8MB to make that much difference!  (The total time for the kernel to run map the memory, check the results is about 2.8 seconds, and apparently 2.6 of that is memory transfer...)

 

I'm at a complete loss here beyond doing some sort of exhaustive check to ensure all results from the odd case come back correctly under all inputs, and conditions.  I'm hoping theres a logical explanation for the behavior and I can skip that

Outcomes