2 Replies Latest reply on Jun 10, 2013 10:44 AM by lenux

    OpenCL CodeXL Profiler executes the kernel several times

    lenux

      I'm trying to profile some Opencl code with CodeXL (or more to the point sprofile).

      This always gives me the wrong output when profiling in performancecounter mode (but not when using the trace option -t),

      so I tried to find out why. After some experimentations I concluded that each kernel is executed three times leading to wrong results for kernels which modify some existing data instead of overwriting it. The following toy program showcases this behaviour.

       

      here is the execute command:

        /opt/CodeXL-Linux-1.1.1537.0-x86_64-release/Output_x86_64/release/bin/x86_64/sprofile -o example.csv -w . OpenCLExample

       

      my kernel:

      #pragma OPENCL EXTENSION cl_amd_printf : enable

       

      kernel void example_kernel(global const float *a,

                     global const float *b,

                     global float *result)

      {

          int gid = get_global_id(0);

          result[gid] = a[gid] * b[gid];

          printf((__constant char *)"DEBUG: example_kernel id: %d result: %g\n", gid, result[gid]);

      }

       

      this is what I get:

      DEBUG: example_kernel id: 0 result: 0

      DEBUG: example_kernel id: 1 result: 2

      DEBUG: example_kernel id: 2 result: 8

      DEBUG: example_kernel id: 3 result: 18

      DEBUG: example_kernel id: 0 result: 0

      DEBUG: example_kernel id: 1 result: 2

      DEBUG: example_kernel id: 2 result: 8

      DEBUG: example_kernel id: 3 result: 18

      DEBUG: example_kernel id: 0 result: 0

      DEBUG: example_kernel id: 1 result: 2

      DEBUG: example_kernel id: 2 result: 8

      DEBUG: example_kernel id: 3 result: 18

        • Re: OpenCL CodeXL Profiler executes the kernel several times
          chesik

          Hi lenux,

           

          When collecting performance counters, the GPU profiler may have to replay the kernel more than once.  There is a hardware limit on the number of counters that can be queried for a given kernel dispatch.  In order to collect all counters, the GPU profiler will replay the kernel the required number of times.  The profiler tracks the buffers used by kernels and will save and restore their state prior to re-dispatchingthe kernel.  It does this to ensure that the replayed kernel behaves identically each time it is re-dispatched.  If you are running into a case where this does not appear to be working correctly, the profiler team would be very interested in seeing your test case.  Can you share the application where this does not appear to be working correctly so that we can investigate?

           

          Thanks,

          Chris

          1 of 1 people found this helpful