cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mindsporter
Journeyman III

Understanding profiling info and optimizing launch times

Understanding gathered profiling info in order to optimize launch times and pipelining

I need some help understanding the profiling information that I have gathered so I can perform some optimizations. The GPU being used is HD5970. The OS is Linux. The order of commands in the experiment is:

4 non-blocking clEnqueueImageWrite()s

5 clEnqueueNDRangeKernel()s

1 non-blocking clEnqueueImageRead()

clFinish()

All commands are on a single command queue. The event wait list is empty for all enqueue commands. Attached is the output with the times reported by clGetProfilingInfo() for different stages for each command. The times are in nanoseconds, starting from the queueing of the first image write (0 ns).

Questions:

1. Why does there appear to be a delay of the order of milliseconds between the queueing and submission of commands?

2. Why does there appear to be a similar delay between the submission and start of the commands?

3. Why do some of the commands further down in the list appear to start before commands above them, given that the command queue is in-order?

4. If there is a known issue with these numbers, is it somehow possible to infer the actual values?

COMMAND QUEUED SUBMIT START END ImageWrite1 0 ns 151730 ns 26804937 ns 26805292 ns ImageWrite2 7982 ns 4357611 ns 26804937 ns 26805292 ns ImageWrite3 14109 ns 8074416 ns 26804937 ns 26805292 ns ImageWrite4 20582 ns 11748876 ns 26804937 ns 26805292 ns kernel1 45609 ns 15382279 ns 26714762 ns 26805292 ns kernel2 53920 ns 16298516 ns 26350001 ns 26805292 ns kernel3 61233 ns 16988515 ns 26348697 ns 26805292 ns kernel4 68034 ns 17475625 ns 26347526 ns 26805292 ns kernel5 88535 ns 17634632 ns 26615571 ns 26805292 ns ImageRead1 90667 ns 17790585 ns 25952634 ns 26805292 ns

0 Likes
3 Replies
himanshu_gautam
Grandmaster

Hi mind sporter,

Please send a test case at streamdeveloper@amd.com

0 Likes

Hi Himanshu,

Sorry but I am unable to send out our test case due to code-sharing restrictions. I have already given an overview of the test case in the previous post. I am sure you guys have plenty of dummy OpenCL programs at your disposal using which you should be able to arrive at observations similar to ours. As I understand from this post we are not the only ones faced with drawn out kernel launch delay issues. The other issues that I mention may also have surfaced previously in different forms. Assuming that, there must be enough knowledge in house about the issues to give us an idea about when/whether we can expect their resolution or else about how to work around them.

Cheers,

Albert.

0 Likes

You can check out the ATI STREAM SDK 2.3.It is expected to be released soon.

0 Likes