I recently ported my neural network app. to OpenCL, to run on my ATI5870. I got about a 4X performance improvement. Just for yuks, I changed the mode from GPU to CPU, and suprisingly, the app. ran about 12% faster than in the GPU mode. So, I have 2 questions.
1) What threading model is being used for CPU mode? Is this available as a package outside of OpenCL?
2) I am using an int4 to access the array of connections. Is SSE being used for the val.w, x, y, z?