when I use vector in the opencl code, like int4/char 4/uchar 8, is it guaranteed to be translated using SSE instructions on CPU? It seems not in my cl code after check the amd app kernel analyzer. What shall I pay attention to in my cl code when I want to use the vectorize abilities for my opencl on CPU?