Not getting kernel execution time

I am not able to get the kernel running time for the attached program. Is there anything wrong with the program ? I have HD Radeon 6970 and OpenCL SDK 2.4

The same behavior is observed for other programs too, that take more than a minute to finish.

// // Copyright (c) 2010 Advanced Micro Devices, Inc. All rights reserved. // // A minimalist OpenCL program. #include <CL/cl.h> #include <stdio.h> #define NWITEMS 512 // A simple memset kernel const char *source = "__kernel void memset( __global uint *dst ) \n" "{ \n" " dst[get_global_id(0)] = get_global_id(0); \n" "} \n"; int main(int argc, char ** argv) { cl_ulong startTime, endTime, runTime; cl_event myEvent; // 1. Get a platform. cl_platform_id platform; clGetPlatformIDs( 1, &platform, NULL ); // 2. Find a gpu device. cl_device_id device; clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL); // 3. Create a context and command queue on that device. cl_context context = clCreateContext( NULL, 1, &device, NULL, NULL, NULL); cl_command_queue queue = clCreateCommandQueue( context, device, 0, NULL ); // 4. Perform runtime source compilation, and obtain kernel entry point. cl_program program = clCreateProgramWithSource( context, 1, &source, NULL, NULL ); clBuildProgram( program, 1, &device, NULL, NULL, NULL ); cl_kernel kernel = clCreateKernel( program, "memset", NULL ); // 5. Create a data buffer. cl_mem buffer = clCreateBuffer( context, CL_MEM_WRITE_ONLY, NWITEMS * sizeof(cl_uint), NULL, NULL ); // 6. Launch the kernel. Let OpenCL pick the local work size. size_t global_work_size = NWITEMS; clSetKernelArg(kernel, 0, sizeof(buffer), (void*) &buffer); clEnqueueNDRangeKernel( queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, &myEvent); clFinish( queue ); // 7. Look at the results via synchronous buffer map. cl_uint *ptr; ptr = (cl_uint *) clEnqueueMapBuffer( queue, buffer, CL_TRUE, CL_MAP_READ, 0, NWITEMS * sizeof(cl_uint), 0, NULL, NULL, NULL ); int i; for(i=0; i < NWITEMS; i++) printf("%d %d\n", i, ptr); // 8. Compute running time and display startTime = endTime = runTime = 0; clGetEventProfilingInfo( myEvent, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, NULL ); clGetEventProfilingInfo( myEvent, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, NULL ); runTime = endTime - startTime; printf(" Running time = %lu\n", runTime); return 0; }

2 Replies

Why don't you try APP Profiler that comes with APP SDK on Windows. It's integrated into VS2010. Works like magic!


There doesn't appear to be anything wrong.You said, you observe this for other programs that take more a minute to execute. Do you also see this with SDK samples like constantBandwidth,LDSBandwidth?