TheG

Submitting commands is slow

Discussion created by TheG on Mar 19, 2011
Latest reply on Mar 21, 2011 by jeff_golds
submit time >> work time

Hi,

i'm experiencing some weird performance problems in my program. What the host code does is calling 3 kernels in a loop, writing two input buffers at the beginning and one output buffer at the end of each iteration.

The cltrace of one iteration is attached. You can see the problem most notably for the first buffer write. The write itself (COMMAND_END - COMMAND_START) takes 0.4ms, but the time from queuing to submission (COMMAND_SUBMIT - COMMAND_QUEUE) is 8.3ms! The other commands take about 1.1ms to submit, more than the working time for every single command.

In synthetic test cases the submit time is around 0.01ms, what looks much more plausible.

So what can slow things down in such a way?

 

All tests run on a 5870, using SDK 2.3

 

20 clGetMemObjectInfo 4186701410374 4186701412819 54 clEnqueueWriteBuffer 4186701418197 4186701429441 4596 CL_COMMAND_WRITE_BUFFER 4186701426019 4186709736643 4186709872368 4186710280613 0 0x04B504F0 0 0x04B4D788 Cypress 22848 20 clGetMemObjectInfo 4186709139709 4186709142642 20 clGetMemObjectInfo 4186709146553 4186709148509 54 clEnqueueWriteBuffer 4186709513220 4186709523975 4596 CL_COMMAND_WRITE_BUFFER 4186709521042 4186710152686 4186710280612 4186710979352 0 0x04B504F0 0 0x04B4D788 Cypress 38400 17 clRetainMemObject 4186709528375 4186709530331 39 clSetKernelArg 4186709533264 4186709535709 18 clReleaseMemObject 4186709538642 4186709540598 17 clRetainMemObject 4186709543042 4186709545486 39 clSetKernelArg 4186709547442 4186709549886 18 clReleaseMemObject 4186709552331 4186709554286 17 clRetainMemObject 4186709556731 4186709558686 39 clSetKernelArg 4186709561131 4186709563086 18 clReleaseMemObject 4186709565531 4186709567486 17 clRetainMemObject 4186709569931 4186709572375 39 clSetKernelArg 4186709574331 4186709576775 18 clReleaseMemObject 4186709579220 4186709581175 66 clEnqueueNDRangeKernel 4186709584109 4186709594375 4592 CL_COMMAND_NDRANGE_KERNEL 4186709591931 4186710702687 4186710978864 4186711075167 0 0x04B504F0 0 0x04B4D788 Cypress 0x04B658D0 KernelA {4800} {NULL} 17 clRetainMemObject 4186709598286 4186709600731 39 clSetKernelArg 4186709602686 4186709605131 18 clReleaseMemObject 4186709607575 4186709609531 17 clRetainMemObject 4186709611975 4186709613931 39 clSetKernelArg 4186709616375 4186709618331 18 clReleaseMemObject 4186709620775 4186709623220 17 clRetainMemObject 4186709625175 4186709627620 39 clSetKernelArg 4186709629575 4186709632020 18 clReleaseMemObject 4186709634464 4186709636420 17 clRetainMemObject 4186709638864 4186709640820 39 clSetKernelArg 4186709643264 4186709645220 18 clReleaseMemObject 4186709648153 4186709650109 66 clEnqueueNDRangeKernel 4186709652553 4186709659886 4592 CL_COMMAND_NDRANGE_KERNEL 4186709657442 4186710797042 4186711074189 4186711178058 0 0x04B504F0 0 0x04B4D788 Cypress 0x04B65A10 KernelB {4800} {NULL} 17 clRetainMemObject 4186709662820 4186709664775 39 clSetKernelArg 4186709667220 4186709669664 18 clReleaseMemObject 4186709671620 4186709674064 17 clRetainMemObject 4186709676020 4186709678464 39 clSetKernelArg 4186709680420 4186709682864 18 clReleaseMemObject 4186709685309 4186709687264 17 clRetainMemObject 4186709689709 4186709691664 39 clSetKernelArg 4186709694109 4186709696064 18 clReleaseMemObject 4186709698509 4186709700464 66 clEnqueueNDRangeKernel 4186709703398 4186709710731 4592 CL_COMMAND_NDRANGE_KERNEL 4186709708286 4186710850819 4186711178057 4186711215975 0 0x04B504F0 0 0x04B4D788 Cypress 0x04B65810 KernelC {4800} {NULL} 52 clEnqueueReadBuffer 4186709713664 4186712140020 4595 CL_COMMAND_READ_BUFFER 4186709718064 4186710888953 4186711215975 4186712029042 0 0x04B504F0 0 0x04B4D788 Cypress 998400

Outcomes