What is a Believable Benchmark?

By Mario Rodrigues

Date: May 7, 2002

As the latest P4s get run through the different sets of benchmarks, hardware sites have been moving over to the latest suites that this year has to offer. Two things of serious omission have been severely lacking in too many reviews. First, an adequate explanation as to what these new benchmarks bring to the table, and second, the big difference in results that can be seen from last year's benchmarks to the current.

The Tech Report highlighted an eye-popping example of this phenomenon in Content Creation Winstone. Last year's benchmark showed AMD's Athlon XP2100+ comfortably beating all comers. With this year's version, it was relegated to fourth spot. How can a benchmark show such ambiguous results from one year to the next? Are we saying that content creation software from 2001 performs better on Athlon, and software from 2002 performs better on Intel's P4? How will the consumer differentiate between these widely differing results when making a purchasing decision? What confidence can the consumer put in theses ambiguous results? Any skeptic would think that this latest benchmark had been specifically tuned to favor Intel's P4. The Tech Report has also reported here that Athlon users have reported slowdowns when moving from Lightwave 7.0 to 7.0b.

Business Winstone 2001 is another benchmark that currently highlights the P4's poor performance. These top of the range 1.7 GHz P4-M notebooks from Sony and Dell show disappointing performance when running mainstream productivity applications. Note the PIII-M's 570 MHz frequency disadvantage. Will this year's Business Winstone show Intel's P4 in a far better light?

Ace's Hardware has recently published some Finite Elements Analyses (FEA) benchmarks that shows the superiority of Athlon over P4. One engineer commented, "Why is the Athlon so much faster in the code? This is not obvious by having a look at SPECfp." Johan De Gelas, Senior Editor at Ace's Hardware, said about these results, "With all due respect to the spec committee, but this shows once again that SpecFP overshoots it's mark."

This continued questioning of a benchmark's veracity continues unabated. Speaking at last year's Platform Conference in San Jose, Randall Kennedy, the Director of Research for Competitive Systems Analysis, stated that BAPCo, the organization responsible for the most popular application level benchmark, SysMark 2000, was simply a "front" for the Santa Clara chip maker. "It's my understanding that all of the other companies listed on BAPCo’s website have effectively just fallen by the wayside" when it comes to guiding the benchmark effort, he said. He wasn't too complimentary of Ziff Davis' Winstone Suite, "Basically the benchmarks out there today say what Intel wants said."

Our own Van Smith delved into his own benchmarketing 101, and came out with some very unsavory results. Real World Technologies published "Benchmarking: The Money Game," which looks at benchmarking from all viewpoints.

It is clear that today's benchmarks still engender strong skepticism about the credibility on offer. They're just not believed to be real world or fair. Hardware sites need to do a far better job in bringing out these home truths to the consumer. As it is, too many reviewers just take on the latest benchmarks and assume them to be the gold standard, from which definitive conclusions can be arrived at. This is clearly bunkum. Reviewers need to show a higher degree of skepticism to the benchmarks on offer, and back their results with solid corroborating evidence.

With BAPCo and MadOnion having already been herded into the Intel camp, AMD's attempts to introduce its TPI (true performance initiative) seems to be a very tall order. AMD may have to do what Intel has already done and setup its own affiliated benchmark organizations to redress Intel's advantage. With these benchmark houses stacked against them, it's amazing that they've managed to stay in the game for this long.

We've had the megahertz myth well and truly beaten. But has the benchmark myth now taken its place?


Pssst!  We've updated our Shopping Page.