I try to figure out the runtime of my kernels with multiple command queues, confirming that my command queues overlapped well. However, when it run in the cuda profile with on GT430, the result always drop some lines so I turn to AMD platform, hoping gDebugger and APP profiler will draw a complete runtime chart of multiple command queues
However, I found these two tools are really shitty on Linux, gDebugger, in particular, will always run into crash when start the application. And the tutorial html pages are obviously ported from windows. Well, the guys who wrote this shall be complained somehow, at lease from me.
anyone who recommended some robust profiler tools?
appreciated in advance