Using gDEBugger I was able to get a proper trace from clAmdBlasSgemmEx() that is failing somewhere during its exectution in clCreateKernelsInProgram() with CL_OUT_OF_HOST_MEMORY (-6) on a (128, 256, 8) matrix multiplication.
This is only visible when running with the debugger attached. If I run the same code by itself or through Visual Studio Debugger, I get random faults later in the program.
I think this should be reported as a return code all the time so that I can stop execution and easily pin-point where things went wrong.
This is ran on an ATI FirePro v8800 on a 64-bit Windows 7 with Visual Studio Professional 2012 with AMD APP SDK 2.8 64-bit and clAmdBlas 1.10.274.