How Can I Optimize my Program in SSE on AMD CPU!!

Discussion created by zorroblsa on Nov 25, 2008
Latest reply on Feb 3, 2009 by tanja1

I write an assembler function in SSE  to caculate Vector mutiply Matrix ...That works well on an Intel CPU , cost only 30% time compare to the FLU assembler by VC8....But  as to my AMD CPU(AthlonX2 3600+).....It cost  about  double  time than FLU...   I tried 3DNOW,which worked even worse...     Does AMD SIMD just work slow?

Can  some one help me? Any suggestion is welcomed.