I think I find bug in AMD K11 (or maybe all others generations).
I programm some code and spent 4h to find bug in my code, which is was like that:
I always use such combinations to store in MEM if there is no way to optimization (also such structure cost me less than _mm_shuffle+_mm_store_pd) and on x86 OS it is work perfect, but when I recompile my program for x64 I notice incorrect results in my prog, so I try to find and correct my errors for 4h, after all I am just try to change this code to
After my program runs correctly. I suggest that this is bug in SIMD pipeline (probably instructions decoder).
Is anybody from AMD CPU part can accept this information?