Hi,
for the processor AMD Phenom(tm) II X6 1045T
under Linux Fedora 14 64Bit i use for gcc ver. 4.5.1
the following optimization flags:
-march=
amdfam10
It should be.
I practically always use -march=native and let compiler find best fit...
@Brane2:
thanks for the tip. When using gcc to see the native options via:
gcc -march=native -Q --help=target
i get the following results for the sse related parts:
( the full result is attached in the code attachment)
...
-march= amdfam10
.....
-msse [disabled]
-msse2 [disabled]
-msse2avx [disabled]
-msse3 [disabled]
-msse4 [disabled]
-msse4.1 [disabled]
-msse4.2 [disabled]
-msse4a [disabled]
.....
-mtune= amdfam10
This seems strange, since amdfam10 should support
sse. It could be that this is related to a gcc bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43718
-m128bit-long-double [disabled] -m32 [disabled] -m3dnow [disabled] -m3dnowa [disabled] -m64 [enabled] -m80387 [enabled] -m96bit-long-double [enabled] -mabi= -mabm [enabled] -maccumulate-outgoing-args [disabled] -maes [disabled] -malign-double [disabled] -malign-functions= -malign-jumps= -malign-loops= -malign-stringops [enabled] -march= amdfam10 -masm= -mavx [disabled] -mbranch-cost= -mcld [disabled] -mcmodel= -mcrc32 [disabled] -mcx16 [enabled] -mfancy-math-387 [enabled] -mfma [disabled] -mfma4 [disabled] -mforce-drap [disabled] -mfp-ret-in-387 [enabled] -mfpmath= -mfused-madd [enabled] -mglibc [enabled] -mhard-float [enabled] -mieee-fp [enabled] -mincoming-stack-boundary= -minline-all-stringops [disabled] -minline-stringops-dynamically [disabled] -mintel-syntax [disabled] -mlarge-data-threshold= -mlwp [disabled] -mmmx [disabled] -mmovbe [disabled] -mms-bitfields [disabled] -mno-align-stringops [disabled] -mno-fancy-math-387 [disabled] -mno-push-args [disabled] -mno-red-zone [disabled] -mno-sse4 [enabled] -momit-leaf-frame-pointer [disabled] -mpc -mpclmul [disabled] -mpopcnt [enabled] -mpreferred-stack-boundary= -mpush-args [enabled] -mrecip [disabled] -mred-zone [enabled] -mregparm= -mrtd [disabled] -msahf [enabled] -msoft-float [disabled] -msse [disabled] -msse2 [disabled] -msse2avx [disabled] -msse3 [disabled] -msse4 [disabled] -msse4.1 [disabled] -msse4.2 [disabled] -msse4a [disabled] -msseregparm [disabled] -mssse3 [disabled] -mstack-arg-probe [disabled] -mstackrealign [enabled] -mstringop-strategy= -mtls-dialect= -mtls-direct-seg-refs [enabled] -mtune= amdfam10 -muclibc [disabled] -mveclibabi= -mxop [disabled]
Originally posted by: Brane2 It should be.
I practically always use -march=native and let compiler find best fit...
@Brane2:
i used the native option with a test-case, which is a dense linear
solver via a LU factorization. The speed up i get i around 6 to
8 percent. Is this reasonable?
I have no idea. I'm not trying to make a bomb- I'm just a hobbyist ;o)
WRT to that bug- interesting.
It says that -Q --help=target prints options incorrectly, without taking that -march into account.
But if i use "-march=native -msse4.1 -Q --help=target", it prints -msse/msse2/msse3/mssse3/msse4.1 as enabled... I