Interesting optimisation results for Ryzen (lower is better, left bars "Modified Vrad" show the improvement achieved by fixing a "false sharing" issue in Valve's Source engine; this issue leading to unnecessary inter-core communication, I presume):
The poster explains the poor scaling. It is probably due to Amdahl's Law, a good portion of sequential code that is not speeded up by parallelisation.
"The speedup in the fixed section of the code is likely a fair bit higher than what's shown here."
But, yes, the case shows how easy it is to introduce a scaling problem if the programmer does not understand multi-threading and shared memory well. Luckily, the shared counter, in this case, was useless and could be removed.