I am trying to improve the performance with OpenCL scheme for x86 multicore.
And I re-wrote kernel program using vector integer instead of scalar integer.
However it improves the performance very little (a few percent).
For non-OpenCL case, using SSE doubles the performance.
Does the current kernel compiler really generate SSE instructions for vector integer operations ?
I doubt that a vector operation is emulated with scalar operations at least for integer.
How do I know SSE is generated or not ?