when I use vector in the opencl code, like int4/char 4/uchar 8, is it guaranteed to be translated using SSE instructions on CPU? It seems not in my cl code after check the amd app kernel analyzer. What shall I pay attention to in my cl code when I want to use the vectorize abilities for my opencl on CPU?
Originally posted by: zhuzxy when I use vector in the opencl code, like int4/char 4/uchar 8, is it guaranteed to be translated using SSE instructions on CPU? It seems not in my cl code after check the amd app kernel analyzer. What shall I pay attention to in my cl code when I want to use the vectorize abilities for my opencl on CPU?
It will generate SSE instructions if you use vector types and there is a equivalent SSE instructions exist.
Please paste your kernel code there which allows us to verify whether it is generated or not