Hi everyone,
I am facing a strange problem. I am using a Ryzen 5 3600 and a test that consists in counting the number of letters from 'a' to 'z' in a string of 256000 ASCII characters and I perform this test 100000 on the same string.
I didn't notice the problem on Intel processors, but on AMD I can have several behaviours if I execute 10 times the test :
- either I get nearly same execution time around 4.60 seconds
- or sometimes I get 8 or 9 seconds
- I can also get 14s !!
Here are for example the results of 10 tests with time to execute the code :
1 14.84 2 4.58 3 8.87 4 4.63 5 4.58 6 4.58 7 4.59 8 7.54 9 4.58 10 4.58
Basically the code does this :
align 16 .while: movzx eax, byte [rdi + rcx] ; s[i] sub eax, 'a' ; s[i] - 'a' inc dword [rbx + rax * 4] ; ++letters[ s[i]-'a' ] inc ecx ; ++i cmp ecx, esi ; if (i < size) jne .while ; goto .while .end_while:
I can provide the code to execute if needed.
My question is why do I get so big differences sometimes ??? Does it come from the cache ? From the branch prediction ?
From penalties due to something ?
Best regards,
Jean-Michel
compile it for Linux and it will work fine ;)
mainreason on this behaviour is how Windows handler handles AMD HW
I forgot to say it is under Linux Ubuntu 20.04 kernel 5.4.0-58-generic.
ah thanks - now i have to reproduce it myself :/
can you upload the script anywere and give me the via pm?
Hi, the source code as an archive is available here
To perform the test, please do the following :
unzip asm_vowels.zip cd asm_vowels_64 make clean && make configure && make
Then you may need to run the following script several times (sometimes just once) to see the problem appear
./test_methods.sh
and you will obtain something like this
#;average; min; max; stddev;method ----------------------------------------------------------------------------- 4; 5.636; 4.580; 9.660; 1.988;cv_letters_asm
After 10 executions you see that the average is 5.636 seconds but we get execution times between 4.580 and 9.660 s which involves an important standard deviation of 1.988.
I also could see the same problem happen with an AMD Ryzen 7 1700X :
4; 11.264; 4.770; 26.380; 9.894;cv_letters_asm
So we get execution times from 4.77 s to 26.38 s !!
ok, will test it - but will take some time (as i am pretty lazy this year xD)
Thank you for your help, I am not in a hurry, I just want to know the reason of the problem. I tried AMDuProf but it had a strange behavior telling me problem is in a function I am not using for this test !