With a CPU context, if I use float4 and int4 data types, if all I am doing are literal loads, additions, subtractions and multiplications, will my code be compiled into SIMD instructions that use MMX/SSE/etc?
So when people say that AMD has a slightly lesser SSE implementation than Intel (thus explaining why some video codecs perform better on Intel than AMD), what are they talking about? and what are the ramifications for OpenCL kernels?
Here's part of a table from AMD's website. I am guessing the difference is SSE4a vs. SSE4. Are there OpenCL features that will work better on Intel than AMD because of this?
AMD Phenom Processors
Intel Core 2 Quad
3D & Multimedia instructions
3DNow!™ technology, SSE, SSE2, SSE3, SSE4a
SSE, SSE2, SSE3, SSE4