cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

akumar8
Journeyman III

Not getting kernel execution time

I am not able to get the kernel running time for the attached program. Is there anything wrong with the program ? I have HD Radeon 6970 and OpenCL SDK 2.4

The same behavior is observed for other programs too, that take more than a minute to finish.

// // Copyright (c) 2010 Advanced Micro Devices, Inc. All rights reserved. // // A minimalist OpenCL program. #include <CL/cl.h> #include <stdio.h> #define NWITEMS 512 // A simple memset kernel const char *source = "__kernel void memset( __global uint *dst ) \n" "{ \n" " dst[get_global_id(0)] = get_global_id(0); \n" "} \n"; int main(int argc, char ** argv) { cl_ulong startTime, endTime, runTime; cl_event myEvent; // 1. Get a platform. cl_platform_id platform; clGetPlatformIDs( 1, &platform, NULL ); // 2. Find a gpu device. cl_device_id device; clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL); // 3. Create a context and command queue on that device. cl_context context = clCreateContext( NULL, 1, &device, NULL, NULL, NULL); cl_command_queue queue = clCreateCommandQueue( context, device, 0, NULL ); // 4. Perform runtime source compilation, and obtain kernel entry point. cl_program program = clCreateProgramWithSource( context, 1, &source, NULL, NULL ); clBuildProgram( program, 1, &device, NULL, NULL, NULL ); cl_kernel kernel = clCreateKernel( program, "memset", NULL ); // 5. Create a data buffer. cl_mem buffer = clCreateBuffer( context, CL_MEM_WRITE_ONLY, NWITEMS * sizeof(cl_uint), NULL, NULL ); // 6. Launch the kernel. Let OpenCL pick the local work size. size_t global_work_size = NWITEMS; clSetKernelArg(kernel, 0, sizeof(buffer), (void*) &buffer); clEnqueueNDRangeKernel( queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, &myEvent); clFinish( queue ); // 7. Look at the results via synchronous buffer map. cl_uint *ptr; ptr = (cl_uint *) clEnqueueMapBuffer( queue, buffer, CL_TRUE, CL_MAP_READ, 0, NWITEMS * sizeof(cl_uint), 0, NULL, NULL, NULL ); int i; for(i=0; i < NWITEMS; i++) printf("%d %d\n", i, ptr); // 8. Compute running time and display startTime = endTime = runTime = 0; clGetEventProfilingInfo( myEvent, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, NULL ); clGetEventProfilingInfo( myEvent, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, NULL ); runTime = endTime - startTime; printf(" Running time = %lu\n", runTime); return 0; }

0 Likes
2 Replies
jaidotsh
Staff

Why don't you try APP Profiler that comes with APP SDK on Windows. It's integrated into VS2010. Works like magic!

0 Likes

There doesn't appear to be anything wrong.You said, you observe this for other programs that take more a minute to execute. Do you also see this with SDK samples like constantBandwidth,LDSBandwidth?

0 Likes