I am debugging a kernel that is not very complex. It is extremely slow when stepping over source code. When I let it to another breakpoint that is in the middle of the code, it is there taking long long time. I reduced the work items to only have 16x16 global and 8x8 local work group. It is still SLOW. This makes the debugger useless.
The kernel does use a lot of global and/or texture memory. This memory is needed to do the computation and it can not be reduced.I am not sure if this is the cause of the slowness. I really don't think a debugger would be this slow even if it uses a lot of memory space.