With a CPU context, if I use float4 and int4 data types, if all I am doing are literal loads, additions, subtractions and multiplications, will my code be compiled into SIMD instructions that use MMX/SSE/etc?
yes
So when people say that AMD has a slightly lesser SSE implementation than Intel (thus explaining why some video codecs perform better on Intel than AMD), what are they talking about? and what are the ramifications for OpenCL kernels?
Here's part of a table from AMD's website. I am guessing the difference is SSE4a vs. SSE4. Are there OpenCL features that will work better on Intel than AMD because of this?
AMD Phenom Processors |
Intel Core 2 Quad
| |
3D & Multimedia instructions | 3DNow!™ technology, SSE, SSE2, SSE3, SSE4a | SSE, SSE2, SSE3, SSE4 |