I am writing a simple C program that finds (sort of) the minimum weight cycle in a graph.
What the program does I think is not very relevant but some context will be helpful.
At the code I have the main algorithm functions, two test functions and two benchmark functions.
So, now, the benchmark functions create graphs and then run the algorithm, measuring only the running time of the algorithm. benchmark_sparse() creates a graph with 2*V edges. bench_dense() creates a graph with V*(V-1) edges.
First I run the two tests, then the benchmarks for a set of V sizes.
Now, normally, I would expect that in the benchmarks, the sparse graphs run faster than the dense graphs.
But with GCC and -O1 they run the same. With -O2 the sparse is slower. With -O3 dense is faster as expected.
This behavior with -O2 happens on my AMD Ryzen 7 1700. I tested on an Intel i5 and it doesn't happen.
What is even more weird is that this only happens if I run both test1() and test2(). If I run say only test1(), the behavior is normal (note that the tests are different from the benchmarks and normally they should not influence at all their running time).
I took the asm and literally you can just remove one single call instruction to test2 and you can see the behavior changing.
Lastly, that happens when e.g. I test for two sizes, say 1000 and 2200, where in 2200 is more apparent. Now,
if I only run the 2200, the behavior again does not happen.
I've attached the C source code and the asm I used. Both haven only one line commented, the call to test2() that you can put back or remove to see the behavior changing. In the asm it is even clearer. Run with --bench. Compile the C with -O2.
I would like to have a more reduced source code but actually changing things makes the weird behavior vanish, so you wouldn't be able to reproduce.