I'm using the command line profiler (sprofile) included in the CodeXL 1.1 bundle to profile a program having multiple kernel calls. The problem is that the program stops execution unexpectedly after a few kernel calls(when not profiling it runs correctly) and secondly, it provides zero numbers for some essential performance counters which doesn't make sense. For instance, can the "Wavefronts" counter be zero for a multi workgroup kernel call?
Here is a part of the output:
#ProfileFileVersion=2.6
#ProfilerVersion=2.6.2153
#Application=/home/elias/mounts/smbhome/work/oclmicrobenchmarks/_profile/oclmicrobenchmarks
#ApplicationArgs=1
#Device AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ Platform Vendor=Advanced Micro Devices, Inc.
#Device AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ Platform Name=AMD Accelerated Parallel Processing
#Device AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ Platform Version=OpenCL 1.2 AMD-APP (1113.2)
#Device AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ CLDriver Version=1113.2 (sse2)
#Device AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ CLRuntime Version=OpenCL 1.2 AMD-APP (1113.2)
#Device AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ NumberAppAddressBits=32
#Device Capeverde Platform Vendor=Advanced Micro Devices, Inc.
#Device Capeverde Platform Name=AMD Accelerated Parallel Processing
#Device Capeverde Platform Version=OpenCL 1.2 AMD-APP (1113.2)
#Device Capeverde CLDriver Version=1113.2 (VM)
#Device Capeverde CLRuntime Version=OpenCL 1.2 AMD-APP (1113.2)
#Device Capeverde NumberAppAddressBits=32
#OS Version=Ubuntu 12.04.2 LTS \n \l
#DisplayName=
#ListSeparator=,
Method, ExecutionOrder, ThreadID, CallIndex, GlobalWorkSize, WorkGroupSize, Time, LocalMemSize, VGPRs, SGPRs, ScratchRegs, FCStacks, Wavefronts, VALUInsts, SALUInsts, VFetchInsts, SFetchInsts, VWriteInsts, LDSInsts, VALUUtilization, VALUBusy, SALUBusy, FetchSize, CacheHit, MemUnitBusy, MemUnitStalled, WriteUnitStalled, LDSBankConflict, WriteSize, GDSInsts
VectorReset__k1_Capeverde1, 1, 2379, 45, {16777216 1 1}, { 64 1 1}, 1.77008, 0, 3, 18, 0, NA, 262144.00, 7.00, 1.00, 0.00, 3.00, 1.00, 0.00, 100.00, 12.97, 1.85, 1.00, 0.00, 84.59, 10.89, 30.89, 0.00, 65536.00, 0.00
VectorAdd__k2_Capeverde1, 2, 2379, 52, {16777216 1 1}, { 64 1 1}, 3.74015, 0, 3, 24, 0, NA, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00
VectorRead__k3_Capeverde1, 3, 2379, 62, {16777216 1 1}, { 64 1 1}, 2.36385, 0, 3, 20, 0, NA, 262144.00, 9.00, 3.00, 2.00, 7.00, 0.00, 0.00, 100.00, 12.38, 4.13, 131073.75, 0.00, 80.50, 0.00, 0.00, 0.00, 0.00, 0.00
VectorReduction__k4_Capeverde1, 4, 2379, 70, {16777216 1 1}, { 64 1 1}, 2.47363, 256, 4, 14, 0, NA, 262144.00, 40.00, 33.00, 1.00, 7.00, 1.00, 13.00, 47.38, 52.80, 43.56, 65538.75, 20.01, 82.40, 5.10, 0.00, 0.00, 0.12, 0.00
VectorCalcPI__k5_Capeverde1, 5, 2379, 78, {268435456 1 1}, { 64 1 1}, 38.28548, 256, 6, 18, 0, NA, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00
VectorSequentialCalcPI__k6_Capeverde1, 6, 2379, 85, { 1 1 1}, { 1 1 1}, 2.80504, 0, 6, 18, 0, NA, 1.00, 229384.00, 32775.00, 0.00, 4.00, 1.00, 0.00, 1.56, 1.21, 0.14, 0.44, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00
VectorCalcPIDP__k7_Capeverde1, 7, 2379, 93, {268435456 1 1}, { 64 1 1}, 209.16815, 512, 15, 20, 0, NA, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00
VectorReset__k1_Capeverde1, 8, 2379, 104, {16777216 1 1}, { 128 1 1}, 1.83867, 0, 3, 18, 0, NA, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00
VectorAdd__k2_Capeverde1, 9, 2379, 111, {16777216 1 1}, { 128 1 1}, 3.65896, 0, 3, 24, 0, NA, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00
VectorRead__k3_Capeverde1, 10, 2379, 121, {16777216 1 1}, { 128 1 1}, 2.31111, 0, 3, 20, 0, NA, 262144.00, 9.00, 3.00, 2.00, 7.00, 0.00, 0.00, 100.00, 12.64, 4.21, 131073.75, 0.00, 51.14, 0.00, 0.00, 0.00, 0.00, 0.00
VectorReduction__k4_Capeverde1, 11, 2379, 129, {16777216 1 1}, { 128 1 1}, 3.32178, 512, 4, 14, 0, NA, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00
VectorCalcPI__k5_Capeverde1, 12, 2379, 137, {268435456 1 1}, { 128 1 1}, 49.44948, 512, 6, 18, 0, NA, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00
I'm using a 32bit Ubuntu 12.04 system with Catalyst 13.06 BETA installed and a Radeon HD7750.