cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

pinzo
Journeyman III

kernel submit time is too long

submit time >> enqueue time and execution time

hi,

I have done a host code that call 1 kernel and i use the profiling to see my kernel execution time and the launch time. I read in AMD programming guide that launch time (CL_PROFILING_COMMAND_START - CL_PROFILING_COMMAND_QUEUE) is about 225 microsecond and the profiling adds about 40  microsecond.

Instead my kernel launch time (start-queue) takes about 2.64 millisecond while the esecution time (end-start) takes 0.43 millisecond. The most part of this 2.64 ms come from submit time (start-submit)=2.61 ms while (submit-queue)=0.03ms.

why is the submit time so long? Is it normal? what can I do?

thanks very much,

bye

errcode=clSetKernelArg(kernel,0,sizeof(cl_mem), (void*) &hits_dbuff); if(errcode != CL_SUCCESS) printf("failed arg 0: %d\n",errcode); errcode=clSetKernelArg(kernel,1,sizeof(cl_mem), (void*) &candy_dbuff); if(errcode != CL_SUCCESS) printf("failed arg 1: %d\n",errcode); errcode=clSetKernelArg(kernel,2,sizeof(cl_mem), (void*) &nn_evt_dbuff); if(errcode != CL_SUCCESS) printf("failed arg 2: %d\n",errcode); size_t local_work_size=256; size_t global_work_size =local_work_size*GRID_DIM; size_t group_work=global_work_size/local_work_size; cl_ulong start, end, queued, submit; errcode=clEnqueueNDRangeKernel(queue,kernel,1,NULL,&global_work_size,&local_work_size,0,NULL,&event); clWaitForEvents(1,&event); if(errcode != CL_SUCCESS) printf("failed NDrange: %d\n",errcode); clEnqueueBarrier(queue); clFlush(queue); clFinish(queue); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong), &queued, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_SUBMIT,sizeof(cl_ulong), &submit, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START,sizeof(cl_ulong), &start, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL); clReleaseEvent(event);

Tags (2)
0 Likes
16 Replies
maximmoroz
Journeyman III

kernel submit time is too long

Lazy buffer allocation.

Try profiling the 2nd kernel run instead of the 1st one.

0 Likes
pinzo
Journeyman III

kernel submit time is too long

I try with a "for cycle" external to all the host program but all the repetition have the same submit time.

what I should do?

0 Likes
maximmoroz
Journeyman III

kernel submit time is too long

pinzo, try running the following code:

 

errcode=clSetKernelArg(kernel,0,sizeof(cl_mem), (void*) &hits_dbuff); if(errcode != CL_SUCCESS) printf("failed arg 0: %d\n",errcode); errcode=clSetKernelArg(kernel,1,sizeof(cl_mem), (void*) &candy_dbuff); if(errcode != CL_SUCCESS) printf("failed arg 1: %d\n",errcode); errcode=clSetKernelArg(kernel,2,sizeof(cl_mem), (void*) &nn_evt_dbuff); if(errcode != CL_SUCCESS) printf("failed arg 2: %d\n",errcode); size_t local_work_size=256; size_t global_work_size =local_work_size*GRID_DIM; size_t group_work=global_work_size/local_work_size; cl_ulong start, end, queued, submit; errcode=clEnqueueNDRangeKernel(queue,kernel,1,NULL,&global_work_size,&local_work_size,0,NULL,&event); clWaitForEvents(1,&event); errcode=clEnqueueNDRangeKernel(queue,kernel,1,NULL,&global_work_size,&local_work_size,0,NULL,&event); clWaitForEvents(1,&event); if(errcode != CL_SUCCESS) printf("failed NDrange: %d\n",errcode); clEnqueueBarrier(queue); clFlush(queue); clFinish(queue); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong), &queued, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_SUBMIT,sizeof(cl_ulong), &submit, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START,sizeof(cl_ulong), &start, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL); clReleaseEvent(event);

0 Likes
pinzo
Journeyman III

kernel submit time is too long

I have just tried and the submit time very little shorter than before, now takes 2.43 ms.

Some idea?

thank you

0 Likes
maximmoroz
Journeyman III

kernel submit time is too long

What are OS, driver and AMD APP SDK versions?

0 Likes
pinzo
Journeyman III

kernel submit time is too long

i use  Scientific Linux 5.5, ati-stream-sdk-v2.3-lnx64, and ATI Catalyst 10.12

0 Likes
maximmoroz
Journeyman III

kernel submit time is too long

Well, I have an obvious suggestion: Try with newer APP SDK and driver. I remember they said that they improved kernel launch time in some of recent releases.

P.S. But don't use Catalyst 11.6, it has several flaws. Use 11.5 instead.

http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_APP_SDK_Release_Notes_Developer.pdf : Improved kernel launch times

0 Likes
pinzo
Journeyman III

kernel submit time is too long

I install Catalyst 11.5 and APP SDK 2.4, the submit time is improved and it take only 446 microsecond, but now there is an other problem: with the new APP SDK version the esecution time take 2.77 ms instead of 0.43 ms.

why?

thank you very much for your answer

0 Likes
maximmoroz
Journeyman III

kernel submit time is too long

Pinzo, show me the kernel's source code.

0 Likes