cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

Please explain/comment this profiler results

APP_profiler2.4_C-60.png

I have few questions about this profiler rresults:

1) How is possible to get simultaneous writes in 2 queues? (to same device). There are 3 writes that overlap in time to the same device.

2) why so big gap between 2 kernel executions? What prevents to execute next kernel? And, as result of that gap, why so big ReadBuffer API call time?

EDIT: here write overlapping seen more clearer:

APP_profiler2.4_C-60_buffer_write_overlap.png

0 Likes
6 Replies
Raistmer
Adept II

And another not quite clear picture:

APP_profiler2.4_C-60_buffer_map_no_gain.png

there are buffer read and map both 4kb size.

read performed in usual read-write buffer while mapping uses host pinned memory. Before I saw (via command line sprofile 2.3) considerable speedup in such map operation cause it's zero copy. But here both read and map take ~same (and very big) amount of time...

0 Likes

Hi Raistmer:

Maybe you just used two graphics cards or more. Could you please offer some more information about this, such as the session file of app profiler.

Thank you.

0 Likes

Hi, Wenju

Logs were aquired on C-60 APU with APP Profiler 2.4.

0 Likes
lbin
Staff

Hi Raistmer

Can you share your atp file through AMD help desk (http://developer.amd.com/support/KnowledgeBase/pages/HelpdeskTicketForm.aspx?Category=7) and let us know after you have submitted it.

Thanks

0 Likes

Uploaded, ticket: http://developer.amd.com/support/KnowledgeBase/pages/ticketdetails.aspx?TicketID=1707

Just remove txt extension for file name

0 Likes
Raistmer
Adept II

Ok, I will try to find that session and upload it.

For now another "funny" picture from my poor C-60

APP_profiler2.4_C-60_too_slow_copy.png

Speed of GPU->GPU (same single GPU in system) just unbelievable. My kernall just hidden between these 8kb copies. ~37kb per second - speed of light

What could cause this? No another opencl programs in background. Couls some Flash plugin in background create such effect? Can it be hugely overloaded bus (and how it could become so overloaded??? ) Is it possible that I've seen effect of GPU memory swapping under Win7 WDM driver?

Please note, it's "inside GPU" transfer, not host<-> GPU transfer. So it's not overloaded PCI-e bus but memory buc in C-60 APU...

For comparison: same fragment of code under normal conditions:

APP_profiler2.4_C-60_normal_copy.png

Here memory copy takes just small fraction of elapsed time, most time taken by kernels...

0 Likes