I'll try to mention information that could be relevant.
My application uses multiple OpenCL kernels to process data that is also shared with OpenGL. These kernels are invoked one after another. For one of the last passes over the data there are 2 different algorithms
- Updates the whole buffer several times (data fits into full work groups)
- Updates the buffer selectively, skipping work by returning early in some work items (the intention is to safe time by whole work groups skipping work)
The kernel for these two algorithms is repeatedly invoked on the data.
For example and for both algorithms the work size is 256x256 and the work group size is 16x16. I can dynamically switch between both algorithms by changing a kernel argument (both algorithms are implemented in the same kernel). The kernels are being profiled with OpenCL queue events inside the application and the profiling results seem plausible.
But when I run the application with the AMD APP Profiler the 2nd algorithm behaves unpredictable and seems to give random results.
Unfortunately due to the size of the application and required dependencies I can not provide a test case at the moment.
Please consider the following questions
Did somebody else witness this problem?
Could there be some race condition inside the application that is exposed by the use of the Profiler?
Could it be the differing work load (skipping idle work groups), or does someone use similar kernels and can't see this problem?
Radeon 6950, Ubuntu 10.10, Catalyst 11.5, APP SDK 2.4, APP Profiler 2.2
Maybe someone with knowledge of the internals of the Profiler can tell if a race condition could be exposed by the Profiler. I believe the algorithm is correct, but you never know.