I'm currently investigating a problem with my OpenCL code which successfully runs on Nvidia's stack but fails to return correct results with the latest beta of AMD's stack.
I'm running Ubuntu 9.04, executing the code at http://code.google.com/p/pyrit/source/browse/#svn/trunk/cpyrit_opencl
There are no errors thrown at all. All API calls return CL_SUCCESS or CL_COMPLETE.
Investigating the problem, I found out that the kernel opencl_pmk_kernel() seems to not execute the calls to sha1_process() at all; the value of 'pmk_ctx' in line 196 is always the same as the value of 'temp_ctx' in line 187. The kernel also executes very fast, which also gives a hint that the call to sha1_process() is actually not executed.
I suspect that the compiler falls for some optimization-trap and completly removes the code between lines 188 and 195.
Maybe someone can provide some insight? Maybe my definitions need to be more explicit?