There's a nasty bug in the C++ OpenCL bindings shipped with AMD APP SDK 2.4.
The cl::KernelFunctor class overloads operator() for varying number of kernel arguments. It also includes a vector of cl::Event, the events to wait for before the kernel is executed. This argument is ignored.
The fix is simple enough, just name the argument and pass it into the call to queue_.enqueueNDRangeKernel.
I've posted a bug report in the Khronos forum but AMD may want to escalate this since you ship cl.hpp as part of the APP SDK