A Definitive Answer
I have seen several posts where SSE support is mentioned with respect to AMD/Intel chips when using explicit OpenCL Vectors. Speed ups within my own code seem to agree.
Ex)
double2 d2 = (double2) (1.2, 3.4);
double2 d3 = 3.14*d2;
What I have read indicates that this code will be compiled into a single SSE instruction. I would assume that higher multiples of 128-bits would be broken up into multiple passes (I.e. double4, double8, float16, etc.)
SSE, however, is limited to 128-bits (float4/double2 in one clock cycle), while AVX can handle up to 256-bits (float8/double4 in one clock cycle). I have looked around the internet and seen that AMD is going to be adding support for Intel's AVX instructions to their CPU's. (I don't know if this has already been completed) When this happens, when will OpenCL add support for AVX accelerated Vector Instructions?
Also, Since OpenCL has already been ported to run on Intel CPU's does the Stream SDK 2.3 support AVX accelerated Vector Instructions on Intel (Sandy Bridge) Processors? If not, (given that Intel's Sandy Bridge CPU released relatively recently) is this something which is scheduled to be added in the next SDK release, or something which will only be added when AMD chips support it?
thank you,
-Chris