Why do results from PerformanceTest V10 differ from previous releases?
With the introduction of PerformanceTest V10 (March 2020) a number of changes were made to the benchmark algorithms.
The motivation for the changes were
- We wanted to start using new CPU instructions only available in modern CPUs (e.g. AVX512 & FMA). But it is important to note that the AVX512 part of the benchmark is pretty small. It is only a portion of the Extended Instructions Test. Which is one of eight tests. If AVX512 isn't available then FMA instructions are still used (if available). While the performance difference between SSE instructions (in very old CPUs) and AVX512 is a fairly dramatic. The difference between FMA and AVX512 is less so. AVX512 isn't used at all in the single threaded test.
- Wanting to use a more up to date compiler (Visual Studio 2019 instead of 2013) which also brings some code optimization, better profiling and ease of development.
- Have better support for out of order execution, which is a feature newer CPUs do a better job at, compared to old CPUs.
- Updated the 3rd party libraries we use for some of the CPU tests (including more modern versions of GZip, Crypto++ and Bullet Physics).
- Fixed up a bunch of bugs that hurt performance (like some variable alignment issues and previously poorly chosen compiler optimization flags).
- Completely rewrote some of the CPU tests. e.g. removed the unpopular TwoFish encryption code and replaced it with the more common Elliptic curve encryption.
- Improving the algorithms to push more data through the CPU, which also results in more load on the cache and memory subsystem. So older CPUs, those with inadequate cache or memory bandwidth are expected not to perform so well with PT10.
- We wanted to build in support for comparing x86 PC CPU benchmarks to ARM and iOS systems
- For the 2D tests we started to make use of new Win10 built in features, that were not available in Win7 (and support for Vista was terminated). These include the rendering of SVG image files and PDF files. This hurts Win7, but adds more relevancy to the tests as displaying SVGs on web pages and PDF files are common tasks.
- Also for the 2D tests we started to make more use of DirectX (DX11) to render graphics. Performance is better than programming through GDI+ but it takes more coding effort.
- We wanted to place more load on the video card for 2D and 3D, so screen resolution was increased on many of the tests. 1080p is now the expected minimum to run the tests.
- We added a new 4KQD1 disk test (using 4K block size with a queue depth of 1) as this has become a defacto industry standard over the last 5 years
- We had been using using mostly the same benchmark code since 2012, so after 8 years the hardware was starting to move on and it was time for a update to maintain long term relevance and continue to reflect the performance of modern real world applications.
So the new individual PT10 results can't at least on the surface be compared to the PT9 results. They are really different. Probably the biggest algorithm overhaul we have made in 20 years of development.
BUT for the CPUMark value, which is a combination result derived from the results of all the individual tests, we scaled it back to PT9 levels. So the PT10 CPUMark is somewhat comparable to the PT9 CPUMark. More so for CPUs from the same era, not so much for old vs new CPUs.
The net effect of these changes on the CPU results is to reward CPUs which have modern instruction sets, better IPC and faster memory buses. So in PerformanceTest V10 there is a larger gap between modern and old CPUs, as compared to V9. i.e Old CPUs suddenly look worse in PT10.
Our public benchmark charts have also been updated to reflect the algorithm changes. There was some significant volatility in the results during the release month (March 2020) but this has settled down now.
We understand that many people don't like change, regardless of the reason. For this reason we have also kept the old PerformanceTest V9 results and the V9 software up on our web site for download.