cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

robert_dietrich
Journeyman III

Strange Profiling Results for Asynchronous Commands

I want to get correct profiling information for asynchronous commands in an in-order command queue. The OpenCL spec says the following for in-order command queues: "A prior command on the queue completes before the following command begins. "

Therefore I modified the NVIDIA SDK 3.0 oclDotProduct example, enabled the queue for profiling and gave events to the clEnqueueWriteBuffer and clEnqueueNDRangeKernel calls. After a blocking clEnqueueReadBuffer I printed out the CL_PROFILING_COMMAND_START and CL_PROFILING_COMMAND_END for each call.

The call sequence is: 2 non-blocking clEnqueueWriteBuffer -- 1 clEnqueueNDRangeKernel -- 1 blocking clEnqueueReadBuffer.

According to the OpenCL spec the commands should be executed one after another. The results are correct for both CUDA and ATI-Stream GPUs, but the profiling information for ATI-Stream are strange. The three non-blocking commands start with different OpenCL timestamps and end with the same. In my test case:

clStartTime: 159722421037346; clStopTime: 159722471600728
clStartTime: 159722422219277; clStopTime: 159722471600728
clStartTime: 159722466764412; clStopTime: 159722471600728

The duration of the calls is measured correctly, but they are overlapping. If flushes are introduced between the calls, the OpenCL timestamps are perfect one after another. Even clEnqueueBarrier does not help to serialize the non-blocking commands in this in-order queue.

Has anyone experienced the same and knows a solution or workaround?

I am using: SUSE SLED 11, ATI-Stream 2.01, AMD 5770, CUDA SDK 3.0 OpenCL

0 Likes
9 Replies
omkaranathan
Adept I

robert,

Could you post the modified code?

 

0 Likes

The CUDA SDK 3.0 oclDotProduct is just an example. I attached the code adaption of oclDoProduct.cpp.

// -------------------------------------------------------- // Core sequence... copy input data to GPU, compute, copy results back /* Profiling test */ cl_event evts[3]; // Asynchronous write of data to GPU device shrLog("clEnqueueWriteBuffer (SrcA and SrcB)...\n"); ciErrNum = clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcA, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize * 4, srcA, 0, NULL, &evts[0]); //clFlush(cqCommandQueue); ciErrNum |= clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcB, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize * 4, srcB, 0, NULL, &evts[1]); shrCheckErrorEX (ciErrNum, CL_SUCCESS, pCleanup); //clFlush(cqCommandQueue); // Launch kernel shrLog("clEnqueueNDRangeKernel (DotProduct)...\n"); ciErrNum = clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL, &szGlobalWorkSize, &szLocalWorkSize, 0, NULL, &evts[2]); shrCheckErrorEX (ciErrNum, CL_SUCCESS, pCleanup); // Read back results and check accumulated errors shrLog("clEnqueueReadBuffer (Dst)...\n\n"); ciErrNum = clEnqueueReadBuffer(cqCommandQueue, cmDevDst, CL_TRUE, 0, sizeof(cl_float) * szGlobalWorkSize, dst, 0, NULL, NULL); shrCheckErrorEX (ciErrNum, CL_SUCCESS, pCleanup); /* Profiling test */ for(int i = 0; i < 3; i++){ cl_ulong ocl_strttime, ocl_stoptime, ocl_submittime, ocl_queuedTime; clGetEventProfilingInfo(evts, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &ocl_strttime, NULL); clGetEventProfilingInfo(evts, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &ocl_stoptime, NULL); clGetEventProfilingInfo(evts, CL_PROFILING_COMMAND_SUBMIT, sizeof(cl_ulong), &ocl_submittime, NULL); clGetEventProfilingInfo(evts, CL_PROFILING_COMMAND_QUEUED, sizeof(cl_ulong), &ocl_queuedTime, NULL); printf("CL_strtTime: %llu; CL_stopTime: %llu; CL_submitTime: %llu; CL_queuedTime: %llu\n", ocl_strttime, ocl_stoptime, ocl_submittime, ocl_queuedTime); }

0 Likes

Developers are looking into the issue, thanks for reporting.

0 Likes

I just installed ATI Stream SDK 2.1 and linux drivers 10.4. The problem seems to be solved now!

0 Likes

Actually the issue is not solved for ATI/AMD GPUs! I just used the wrong device. For CPUs profiling information are correct.

0 Likes

This being looked into by developers, you can expect this to be fixed in an upcoming release.

0 Likes

I just tried the new Stream SDK 2.2 with ATI Catalyst 10.7, but profiling information are still incorrect for my ATI Radeon HD 5770. In contrast my NVIDIA Tesla C2050 card and the CPU return correct profiling information with the current ATI Stream SDK.

0 Likes

Originally posted by: robert.dietrich I just tried the new Stream SDK 2.2 with ATI Catalyst 10.7, but profiling information are still incorrect for my ATI Radeon HD 5770. In contrast my NVIDIA Tesla C2050 card and the CPU return correct profiling information with the current ATI Stream SDK.

This is not fixed yet. You can expect in upcomming releases.

0 Likes

With Stream SDK 2.6, Catalyst 11.12 and AMD Radeon 6950 the profiling results for asynchronous commands are correct.

0 Likes