pinzo

kernel submit time is too long

Discussion created by pinzo on Jun 26, 2011
Latest reply on Jun 29, 2011 by pinzo
submit time >> enqueue time and execution time

hi,

I have done a host code that call 1 kernel and i use the profiling to see my kernel execution time and the launch time. I read in AMD programming guide that launch time (CL_PROFILING_COMMAND_START - CL_PROFILING_COMMAND_QUEUE) is about 225 microsecond and the profiling adds about 40  microsecond.

Instead my kernel launch time (start-queue) takes about 2.64 millisecond while the esecution time (end-start) takes 0.43 millisecond. The most part of this 2.64 ms come from submit time (start-submit)=2.61 ms while (submit-queue)=0.03ms.

why is the submit time so long? Is it normal? what can I do?

thanks very much,

bye

errcode=clSetKernelArg(kernel,0,sizeof(cl_mem), (void*) &hits_dbuff); if(errcode != CL_SUCCESS) printf("failed arg 0: %d\n",errcode); errcode=clSetKernelArg(kernel,1,sizeof(cl_mem), (void*) &candy_dbuff); if(errcode != CL_SUCCESS) printf("failed arg 1: %d\n",errcode); errcode=clSetKernelArg(kernel,2,sizeof(cl_mem), (void*) &nn_evt_dbuff); if(errcode != CL_SUCCESS) printf("failed arg 2: %d\n",errcode); size_t local_work_size=256; size_t global_work_size =local_work_size*GRID_DIM; size_t group_work=global_work_size/local_work_size; cl_ulong start, end, queued, submit; errcode=clEnqueueNDRangeKernel(queue,kernel,1,NULL,&global_work_size,&local_work_size,0,NULL,&event); clWaitForEvents(1,&event); if(errcode != CL_SUCCESS) printf("failed NDrange: %d\n",errcode); clEnqueueBarrier(queue); clFlush(queue); clFinish(queue); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong), &queued, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_SUBMIT,sizeof(cl_ulong), &submit, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START,sizeof(cl_ulong), &start, NULL); clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL); clReleaseEvent(event);

Outcomes