    Using vector integer does not improve the performance for x86 multicore


      I am trying to improve the performance with OpenCL scheme for x86 multicore.

      And I re-wrote kernel program using vector integer instead of scalar integer.

      However it improves the performance very little (a few percent).

      For non-OpenCL case, using SSE doubles the performance.

      Does the current kernel compiler really generate SSE instructions for vector integer operations ?

      I doubt that a vector operation is emulated with scalar operations at least for integer.

      How do I know SSE is generated or not ?