cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

lenux
Journeyman III

OpenCL CodeXL Profiler executes the kernel several times

I'm trying to profile some Opencl code with CodeXL (or more to the point sprofile).

This always gives me the wrong output when profiling in performancecounter mode (but not when using the trace option -t),

so I tried to find out why. After some experimentations I concluded that each kernel is executed three times leading to wrong results for kernels which modify some existing data instead of overwriting it. The following toy program showcases this behaviour.

here is the execute command:

  /opt/CodeXL-Linux-1.1.1537.0-x86_64-release/Output_x86_64/release/bin/x86_64/sprofile -o example.csv -w . OpenCLExample

my kernel:

#pragma OPENCL EXTENSION cl_amd_printf : enable

kernel void example_kernel(global const float *a,

               global const float *b,

               global float *result)

{

    int gid = get_global_id(0);

    result[gid] = a[gid] * b[gid];

    printf((__constant char *)"DEBUG: example_kernel id: %d result: %g\n", gid, result[gid]);

}

this is what I get:

DEBUG: example_kernel id: 0 result: 0

DEBUG: example_kernel id: 1 result: 2

DEBUG: example_kernel id: 2 result: 8

DEBUG: example_kernel id: 3 result: 18

DEBUG: example_kernel id: 0 result: 0

DEBUG: example_kernel id: 1 result: 2

DEBUG: example_kernel id: 2 result: 8

DEBUG: example_kernel id: 3 result: 18

DEBUG: example_kernel id: 0 result: 0

DEBUG: example_kernel id: 1 result: 2

DEBUG: example_kernel id: 2 result: 8

DEBUG: example_kernel id: 3 result: 18

0 Likes
2 Replies
chesik
Staff

Hi lenux,

When collecting performance counters, the GPU profiler may have to replay the kernel more than once.  There is a hardware limit on the number of counters that can be queried for a given kernel dispatch.  In order to collect all counters, the GPU profiler will replay the kernel the required number of times.  The profiler tracks the buffers used by kernels and will save and restore their state prior to re-dispatchingthe kernel.  It does this to ensure that the replayed kernel behaves identically each time it is re-dispatched.  If you are running into a case where this does not appear to be working correctly, the profiler team would be very interested in seeing your test case.  Can you share the application where this does not appear to be working correctly so that we can investigate?

Thanks,

Chris

Hello

thx for the reply. I will try to compact my code sample to share the application where this does not appear to be working correctly...

Can you tell me what I have to do, to stop this behaviour above?

thx

0 Likes