Benchmarks: AMD's New Athlon XP 1800+, Part 2

By Van Smith

Date: October 11, 2001

I apologize for the terseness of this review, but I have been sick the last two days.


Memory Performance Continued

As we have seen, the Athlon XP has very good memory latency performance characteristics.  We will now look at the second important measure of a memory subsystem and that is bandwidth.

The term "bandwidth" refers to the rate that data can be transferred.  The Pentium 4 enjoys an innate advantage in bandwidth thanks to its 400 MHz front-side bus (100 MHz quad-pumped).  This advantage is demonstrated clearly in BandwidthBurn. 

Note: AMD has contacted us with instructions on how to boost the performance of the Athlon on BandwidthBurn.  The technique is simple and called Block Prefetch.   Block Prefetch can be implemented easily without resorting to assembly language or special compilers.  We do not currently have Block Prefetch implemented in the copy of BandwidthBurn that was used to generate the results below.  Also note that the test reports bandwidth where 1 MB = 10^6 B/s which is a convention used by Stream.  Finally, there appears to be a very slight over estimation of true bandwidth.  The test employs a correction factor calculated at startup to counteract the timer routine latencies.  We have not yet attempted to correct the small overestimation because the degree of error is not significant, but it is enough to bring the P4's bandwidth on PC 2100 DDR SDRAM very slightly above its maximum theoretical throughput.

To read the chart below, the curves first demonstrate the bandwidth of L1 then L2 cache followed by main memory as we move from left to right.  It is very evident that the Athlon uses an exclusive cache architecture giving it about 64 kB more of effective data cache than the P4.  Note that the main memory values for the P4 are slightly greater than what is physically possible for DDR SDRAM.  As we state above, we believe this is likely due to a latency correction factor that is skewing results slightly higher than actual, but we also must state that we have noticed that Intel chips tend to produce curves that drop off only gradually to main memory bandwidth after leaving the L2 cache, unlike other chips like the Athlon which almost immediately recede to main memory bandwidth.

The first test involves initializing an array with a constant and then retrieving that data.  The results are the average of a write plus a read.  The Pentium 4 dominates this test as the chip currently offers unparalleled raw bandwidth.  However there is a problem that clouds this win for the P4 which we will see in the next graph.

The next chart is a simple variation of the test above, but adds a single integer multiplication to the assignment statement.  In Stream parlance, this is an integer scaling operation.  Although the Athlon XP is only modestly impacted by this additional operation, we see that the Intel Pentium 4 suffers severe performance degradation. 

This is a demonstration of the Pentium 4's quirky and unbalanced behavior that makes it generally less desirable than the much more even performing and robust AMD Athlon XP design.



Sisoftware's Sandra Tests

We are using the beta version of Sandra, version 8.17XP, that has specific bandwidth optimizations for the Athlon.  Note that the Athlon XP actually beats the Pentium 4 in floating point bandwidth even though the Pentium 4 enjoys a superior chipset, the VIA P4X266.

Also note the extremely poor performance of the Pentium 4's FPU.  Originally Sisoftware intended to report SSE2 values alone for the FPU performance.  Even though I am directly responsible for the compromise in Sandra where now both the FPU and SSE2 values are displayed, few reviewers have noticed both values and continue to report SSE2 results as floating point unit performance.  This is incorrect and misleading since SSE2 will not be utilized for legacy software (still the vast majority of existing programs), nor can all algorithms easily be adapted to use SSE2.


ZD Benchmarks

It is illuminating to take a look at the relative performance of the two CPU's through a couple of older Ziff-Davis benchmarks.

The Athlon XP dominates on the venerable CPUMark99 test and this does not change for FPU WinMark 99 below.

If you wonder why ZD does not continue to use these tests, this should make it clear.


Scientific Benchmarks

Dr. Tim Wilkens' ScienceMark also shows the Athlon XP with a significant advantage over the Pentium 4.

Two simple tests in Mathematica 4.0 show the two platforms performing at parity.  This is likely due to the bandwidth intensive nature of these calculations.  The first test is the time it takes to calculate Pi to 500,000 decimal places.  The second test plots the following:

ParametricPlot3D[{u Cos[u] (4 + Cos[v + u]), u Sin[u] (4 + Cos[v + u]), u Sin[v + u]}, {u, 0, 4 \[Pi]}, {v, 0, 2 \[Pi]}, PlotPoints -> {600, 20}]


Miscellaneous Tests

The following collection of tests, all written in Delphi, unexpectedly gives the Athlon XP wins in every instance.  However, both the Random Dots test and the Plot Lines test are latency intensive.  The P4 system was using a Radeon DDR graphics card, but that should not have a significant impact on these very simple 2d tests where nearly all modern graphics cards perform identically.  We will rerun the tests with a GeForce3 soon and report the results.

To be continued...

We finish up our initial Athlon XP evaluation with an analysis soon.

Click here to go to Part One



Pssst!  Our Shopping Page has been updated.