Various issues to fix.

Benchmarks should reproducible, and have more stable performance, when
using different compilers. gcc sometimes does some crazy good
optimisations, that defeat a lot of benchmark goals.


fib and takeuchi have super different performance with gcc and clang.
without precautions, results can be 100 times difference.
with some precautions, the results are still rather big.

with some other precautions, then clang is faster about 4 times in fib.

lorenz is almost 3 times slower on clang, and produces different result.

nqueens is about 30% slower on clang.


General: Tune length of each benchmark.