Running SPECcpu2017 507.cactuBSSN_r on a 9654 compiled with AOCC 4.0. When I compile with "-Ofast -march-znver4 -flto" the program will go into an infinite loop in re_match_2_internal. Using znver3 does not go into an infinite loop.
Hi AlexMericas
Thank you for writing to us.
I am trying to reproduce the issue at my end. I have tried with the flags you mentioned "-Ofast -march-znver4 -flto". I am seeing other issue , I am not seeing the infinite loop issue.
Can you please share your config file so that i can reproduce the issue at my end
Thank you
Best Regards
Hemanth
Sure, I can share my profile. How do I upload a file? I only see options for photos or videos.
Hello AlexMercias
You can click on expand toolbar(three dots symbol) , select source code( HTML symbol) and copy the contents of config file there or
you can upload it in any drive and paste the link for the same here so we can access it
Thank you
Hemanth
#------------------------------------------------------------------------------
# AMD AOCC 400 SPEC CPU 2017 Rate Configuration File for 64-bit Linux
# based off of AMD's amd_rate_aocc400_genoa_B1.cfg
#------------------------------------------------------------------------------
#--------- Label --------------------------------------------------------------
# Arbitrary string to tag binaries (no spaces allowed)
# Two Suggestions: # (1) EDIT this label as you try new ideas.
% define label "aocc400-ofast-zen4" # (2) Use a label meaningful to *you*.
#--------- Preprocessor -------------------------------------------------------
%ifndef %{bits} # EDIT to control 32 or 64 bit compilation. Or,
% define bits 64 # you can set it on the command line using:
%endif # 'runcpu --define bits=nn'
%ifndef %{build_ncpus} # EDIT to adjust number of simultaneous compiles.
% define build_ncpus 64 # Or, you can set it on the command line:
%endif # 'runcpu --define build_ncpus=nn'
# Don't change this part.
%if %{bits} == 64
% define model -m64
%elif %{bits} == 32
% define model -m32
%else
% error Please define number of bits - see instructions in config file
%endif
#--------- Global Settings ----------------------------------------------------
# For info, see:
# https://www.spec.org/cpu2017/Docs/config.html#fieldname
# Example: https://www.spec.org/cpu2017/Docs/config.html#tune
backup_config = 0
command_add_redirect = 1
flagsurl = $[top]/config/flags/gcc.xml
ignore_errors = 1
iterations = 1
label = % {label}-m%{bits}. # fix this line because the @#$@$ upload process will not accept it
line_width = 1020
log_line_width = 1020
makeflags = --jobs=%{build_ncpus}
mean_anyway = 1
output_format = txt,csv
preenv = 1
reportable = 0
nobuild = 0
verify_binaries = no # Somewhat dangerous!
tune = base,peak # EDIT if needed: set to "base" for old GCC.
# See note "Older GCC" above.
#--------- How Many CPUs? -----------------------------------------------------
# Both SPECrate and SPECspeed can test multiple chips / cores / hw threads
# - For SPECrate, you set the number of copies.
# - For SPECspeed, you set the number of threads.
# See: https://www.spec.org/cpu2017/Docs/system-requirements.html#MultipleCPUs
#
# q. How many should I set?
# a. Unknown, you will have to try it and see!
#
# To get you started, some suggestions:
#
# copies - This config file defaults to testing only 1 copy. You might
# try changing it to match the number of cores on your system,
# or perhaps the number of virtual CPUs as reported by:
# grep -c processor /proc/cpuinfo
# Be sure you have enough memory. See:
# https://www.spec.org/cpu2017/Docs/system-requirements.html#memory
#
# threads - This config file sets a starting point. You could try raising
# it. A higher thread count is much more likely to be useful for
# fpspeed than for intspeed.
#
intrate,fprate:
copies = 128 # EDIT to change number of copies (see above)
intspeed,fpspeed:
threads = 4 # EDIT to change number of OpenMP threads (see above)
# default copy counts:
default:
copies = 128
# Bind commands for assigning affinity:
submit = numactl --localalloc --physcpubind=$SPECCOPYNUM -- $command
################################################################################
#------- Compilers ------------------------------------------------------------
default:
#
%define aocc_dir "/opt/AMD/aocc-compiler-4.0.0" # EDIT
################################################################################
# Paths and Environment Variables
# # ** MODIFY AS NEEDED (modification should not be necessary for runs) **
# ################################################################################
# # Allow environment variables to be set before runs:
preenv = 1
#
# # Necessary to avoid out-of-memory exceptions on certain SUTs:
preENV_MALLOC_CONF = retain:true
#
# # Define the name of the directory that holds AMD library files:
%define binary_package_name
%define lib_dir %{binary_package_name}_lib
# Set the shared object library path for runs and builds:
#preENV_LD_LIBRARY_PATH = $[top]/%{lib_dir}/lib:$[top]/%{lib_dir}/lib32:%{ENV_LD_LIBRARY_PATH}
#
preENV_LD_LIBRARY_PATH = %{aocc_dir}/lib
#preENV_LD_LIBRARY_PATH = %{gcc_dir}/lib64/:%{gcc_dir}/lib/:/lib64:%{ENV_LD_LIBRARY_PATH}
SPECLANG = %{aocc_dir}/bin/
CC = $(SPECLANG)clang %{model}
CXX = $(SPECLANG)clang++ %{model}
FC = $(SPECLANG)flang %{model}
CLD = $(SPECLANG)clang %{model}
CXXLD = $(SPECLANG)clang++ %{model}
FLD = $(SPECLANG)flang %{model}
# How to say "Show me your version, please"
CC_VERSION_OPTION = --version
CXX_VERSION_OPTION = --version
FC_VERSION_OPTION = --version
default:
%if %{bits} == 64
sw_base_ptrsize = 64-bit
sw_peak_ptrsize = 64-bit
%else
sw_base_ptrsize = 32-bit
sw_peak_ptrsize = 32-bit
%endif
#--------- Portability --------------------------------------------------------
default: # data model applies to all benchmarks
EXTRA_PORTABILITY = -DSPEC_LP64
# *** Benchmark-specific portability ***
# # Anything other than the data model is only allowed where a need is proven.
# # (ordered by last 2 digits of benchmark number)
#
500.perlbench_r: #lang='C'
PORTABILITY = -DSPEC_LINUX_X64
521.wrf_r: #lang='F,C'
CPORTABILITY = -DSPEC_CASE_FLAG
FPORTABILITY = -Mbyteswapio
523.xalancbmk_r: #lang='CXX'
PORTABILITY = -DSPEC_LINUX
526.blender_r: #lang='CXX,C'
CPORTABILITY = -funsigned-char
527.cam4_r: #lang='F,C'
PORTABILITY = -DSPEC_CASE_FLAG
################################################################################
# Default libraries and variables
################################################################################
###############################################################################
# AOCC 4.0.0 workarounds that do not count as PORTABILITY
################################################################################
# The workarounds in this section would not qualify under the SPEC CPU
# PORTABILITY rule.
# - In peak, they can be set as needed for individual benchmarks.
# - In base, individual settings are not allowed; set for whole suite.
# Use EXTRA_CFLAGS, EXTRA_CXXFLAGS, and EXTRA_FFLAGS for them.
#
# See:
# https://www.spec.org/cpu2017/Docs/runrules.html#portability
# https://www.spec.org/cpu2017/Docs/runrules.html#BaseFlags
#######################
# Default workarounds #
#######################
default:
# Allow unused compile/link arguments without triggering warnings during build:
EXTRA_CFLAGS = -Wno-unused-command-line-argument
EXTRA_CXXFLAGS = -Wno-unused-command-line-argument
EXTRA_FFLAGS = -Wno-unused-command-line-argument
LDOPTIONS = -Wno-unused-command-line-argument
####################
# Base workarounds #
####################
#
# *** NONE ***
#
##############################
# Integer workarounds - base #
##############################
#
intrate=base:
# The following is necessary for 502/602 gcc:
EXTRA_LDFLAGS = -z muldefs
#########################
# FP workarounds - base #
#########################
#
# *** NONE ***
#
####################
# Peak workarounds #
####################
#
# *** NONE ***
#
##############################
# Integer workarounds - peak #
##############################
502.gcc_r=peak: #lang='C'
EXTRA_CFLAGS = -Wno-unused-command-line-argument \
-fgnu89-inline
EXTRA_LDFLAGS = -z muldefs
#####################################
# Floating Point workarounds - peak #
#####################################
#
# *** NONE ***
#
################################################################################
# Tuning Flags
################################################################################
# Libraries:
# EXTRA_LIBS = -lamdalloc -lamdlibm -lm
# MATHLIBOPT = #clearing this variable or else SPEC will set it to -lm
# VECMATHLIB = -fveclib=AMDLIBM
# # Variables:
# OPT_ROOT = -march=znver3 $(VECMATHLIB) -ffast-math
# OPT_ROOT_BASE = -O3 $(OPT_ROOT)
# OPT_ROOT_PEAK = -Ofast $(OPT_ROOT) -flto #Ofast enables -ffast-math
#####################
# Base tuning flags #
#####################
default=base:
OPTIMIZE = -Ofast -march=znver4 -flto
# Libraries:
EXTRA_LIBS = -lflang
#EXTRA_LIBS = -lamdlibm -lm -lamdalloc -lflang
# Don't put the AMD and mvec math libraries in MATH_LIBS because it will trigger a reporting issue
# because GCC won't use them. Forcefeed all benchmarks the math libraries in EXTRA_LIBS and clear
# out MATH_LIBS.
#MATH_LIBS =
default:
basepeak = yes
#--------- EDIT to match your version -----------------------------------------
default:
sw_compiler001 = AMD Optimizing Compiler (AOCC) 4.0.0
sw_compiler002 = AOCC_4.0.0-Build 434 2022_10_28 based on LLVM Mirror.Version.14.0.6
#--------- EDIT info about you ------------------------------------------------
# To understand the difference between hw_vendor/sponsor/tester, see:
# https://www.spec.org/cpu2017/Docs/config.html#test_sponsor
intrate,intspeed,fprate,fpspeed: # Important: keep this line
hw_vendor = Dell
tester = Alex Mericas
test_sponsor = Rivos Inc
license_num = 6386
# prepared_by = # Ima Pseudonym # Whatever you like: is never output
hw_model000 = PowerEdge R7615 (AMD EPYC 9654 96-Core Processor)
#--------- EDIT system information --------------------------------------------
intrate,intspeed,fprate,fpspeed: # Important: keep this line
# Example # Brief info about field
################################################################################
# Hardware, firmware and software information
################################################################################
hw_avail =Feb-2023
sw_avail =Nov-2022
hw_cpu_name =AMD EPYC 9654
hw_cpu_nominal_mhz =2450
hw_cpu_max_mhz =3500
hw_ncores =96
hw_nthreadspercore =1
hw_ncpuorder =1 chips
hw_other =None # Other perf-relevant hw, or "None"
fw_bios =Dell
sw_base_ptrsize =64-bit
hw_pcache =32 KB I + 32 KB D on chip per core
hw_scache =1 MB I+D on chip per core
hw_tcache000 =384 MB I+D on chip per chip, 32 MB shared / 8
hw_tcache001 = cores
hw_ocache =None
# sw_file = # ext99 # File system
# sw_os001 = # Linux Sailboat # Operating system
# sw_os002 = # Distribution 7.2 SP1 # and version
sw_other = jemalloc # Other perf-relevant sw, or "None"
# sw_state = # Run level 99 # Software state.
power_management = # briefly summarize power settings
# Note: Some commented-out fields above are automatically set to preliminary
# values by sysinfo
# https://www.spec.org/cpu2017/Docs/config.html#sysinfo
# Uncomment lines for which you already know a better answer than sysinfo
__HASH__
Well, I tried but to copy/paste but it was rejected as spam. I'll see if I can find a upload site
Try this file
Hello AlexMericas
Thanks for sharing your config file. I am able to reproduce the issue at my end with the config file shared. I have filed a bug report and will keep you updated on the progress for the same
Thanks
Hemanth
Hi @AlexMericas
The bug has been fixed in the latest AOCC compiler release 4.1.0
https://www.amd.com/en/developer/aocc.html#downloads
Please check and feel free to contact if you have any issues
Thanks
Hemanth