Archives Discussions

Raistmer · ‎05-04-2012

I have few questions about this profiler rresults:

1) How is possible to get simultaneous writes in 2 queues? (to same device). There are 3 writes that overlap in time to the same device.

2) why so big gap between 2 kernel executions? What prevents to execute next kernel? And, as result of that gap, why so big ReadBuffer API call time?

EDIT: here write overlapping seen more clearer:

Raistmer · ‎05-04-2012

And another not quite clear picture:

there are buffer read and map both 4kb size.

read performed in usual read-write buffer while mapping uses host pinned memory. Before I saw (via command line sprofile 2.3) considerable speedup in such map operation cause it's zero copy. But here both read and map take ~same (and very big) amount of time...

Wenju · ‎05-08-2012

Hi Raistmer:

Maybe you just used two graphics cards or more. Could you please offer some more information about this, such as the session file of app profiler.

Thank you.

Raistmer · ‎05-08-2012

Hi, Wenju

Logs were aquired on C-60 APU with APP Profiler 2.4.

lbin · ‎05-09-2012

Hi Raistmer

Can you share your atp file through AMD help desk (http://developer.amd.com/support/KnowledgeBase/pages/HelpdeskTicketForm.aspx?Category=7) and let us know after you have submitted it.

Thanks

Raistmer · ‎05-15-2012

Uploaded, ticket: http://developer.amd.com/support/KnowledgeBase/pages/ticketdetails.aspx?TicketID=1707

Just remove txt extension for file name

Raistmer · ‎05-13-2012

Ok, I will try to find that session and upload it.

For now another "funny" picture from my poor C-60

Speed of GPU->GPU (same single GPU in system) just unbelievable. My kernall just hidden between these 8kb copies. ~37kb per second - speed of light

What could cause this? No another opencl programs in background. Couls some Flash plugin in background create such effect? Can it be hugely overloaded bus (and how it could become so overloaded??? ) Is it possible that I've seen effect of GPU memory swapping under Win7 WDM driver?

Please note, it's "inside GPU" transfer, not host<-> GPU transfer. So it's not overloaded PCI-e bus but memory buc in C-60 APU...

For comparison: same fragment of code under normal conditions:

Here memory copy takes just small fraction of elapsed time, most time taken by kernels...

Archives Discussions

Please explain/comment this profiler results