I'm trying to profile some Opencl code with CodeXL (or more to the point sprofile).
This always gives me the wrong output when profiling in performancecounter mode (but not when using the trace option -t
),
so I tried to find out why. After some experimentations I concluded that each kernel is executed three times leading to wrong results for kernels which modify some existing data instead of overwriting it. The following toy program showcases this behaviour.
here is the execute command:
/opt/CodeXL-Linux-1.1.1537.0-x86_64-release/Output_x86_64/release/bin/x86_64/sprofile -o example.csv -w . OpenCLExample
my kernel:
#pragma OPENCL EXTENSION cl_amd_printf : enable
kernel void example_kernel(global const float *a,
global const float *b,
global float *result)
{
int gid = get_global_id(0);
result[gid] = a[gid] * b[gid];
printf((__constant char *)"DEBUG: example_kernel id: %d result: %g\n", gid, result[gid]);
}
this is what I get:
DEBUG: example_kernel id: 0 result: 0
DEBUG: example_kernel id: 1 result: 2
DEBUG: example_kernel id: 2 result: 8
DEBUG: example_kernel id: 3 result: 18
DEBUG: example_kernel id: 0 result: 0
DEBUG: example_kernel id: 1 result: 2
DEBUG: example_kernel id: 2 result: 8
DEBUG: example_kernel id: 3 result: 18
DEBUG: example_kernel id: 0 result: 0
DEBUG: example_kernel id: 1 result: 2
DEBUG: example_kernel id: 2 result: 8
DEBUG: example_kernel id: 3 result: 18