[RETURN HOME]

AMD at MPF: Hammering Open Bottlenecks

By Joel Hruska

Date: October 18, 2001

I spent over an hour on the phone yesterday with Richard Sah and Sean Cleveland who briefed me on the developments at the Microprocessor Forum (MPF) concerning the K8, or “Hammer” as it is more commonly known.  This is a brief report covering a few of the points we discussed.

The K8:  A Model Of Efficiency

The K8 is AMD’s newest family of CPU’s.  The chips will be produced on a 0.13 micron SOI [ed: Silicon-On-Insulator -- a technology that potentially wastes less energy and therefore allows chips to run cooler and faster] process and are set to debut in volume in the second half of next year.  AMD has designed the K8 family to be an all-in-one solution, from high end server products to low power / high performance notebooks.  Judging by the success its predecessor, the Athlon or K7, has had filling these market positions, it’s likely the K8 will continue to expand that trend.

One thing that was repeatedly stressed to me is how focused AMD is on improving processor efficiency.  The K8 will feature enhanced TLB-structures, better branch prediction, and a higher IPC rating than will be found on any K7-era processor.  Furthermore, the chip will carry an onboard memory controller — a move which AMD expects will raise total expenses less than 1% and will expand the core only a miniscule amount — while providing an overwhelming drop in memory latency.  Unlike today’s motherboard-mounted memory controllers (residing typically in the north bridge of the chipset) which run at slow front-side bus (FSB) speeds, the K8’s memory controller will run at the full speed of the processor and will initially support PC1600, 2100, and 2700 memory with plans to support even higher speeds at a later date. 

The K8 will feature a 32-stage pipeline.  Though this may seem staggeringly long, it is important to understand that with the entire memory controller on-die, a great many stages are taken up by that controller itself.  Compared to the K7 the K8 will feature a 12-stage pipe — only two stages more than current Athlons.  This very small rise in the number of pipeline stages should be more than compensated for by increased TLB structures and better branch prediction.  As has also been noted numerous times, the K8 should deliver unmatched 32-bit performance while delivering excellent 64-bit performance as well.

The K8’s Memory Controller

The K8’s on-board memory controller is a major innovation among x86 processors.  As was previously noted, the integrated memory controller will run at full processor speed eliminating a great deal of memory subsystem latency.  Furthermore, the onboard nature of the memory controller allows each processor in a dual or n-way server system to have its own dedicated controller — a feat not currently possible in any x86 server system.

This is yet another example of how the K8 is not just a faster processor, but a more efficient one.  By granting each chip access to an independent memory controller AMD removes one of the largest performance bottlenecks from multiprocessing platforms. 

HyperTransport:  Flexibility Incarnate

It does no good to eliminate the memory bottlenecks, however, if the system bus itself remains hopelessly clogged with data.  This is where AMD believes HyperTransport technology will make the crucial difference.

HyperTransport allows for up to 6.4 gigabytes (GB) per second of data to be transmitted per link (3.2 GB each direction) with up to three links per CPU.  This means it is possible to deliver up to 19.2 gigabytes / second to a n-way server board — a huge increase over the current limits of Infiniband or Intel’s “hub” interconnect. 

Furthermore, HyperTransport is an extremely flexible system.  It is possible to connect CPU’s to each other using HyperTransport links, to connect PCI-X or PCI slots to CPU’s, or any combination thereof. 

The system is also fully 3GIO-compatible (when 3GIO becomes more than just an idea that is) and is designed to scale extremely well.

Why Stay With x86?

This is the point where AMD and Intel’s approach to 64-bit technology diverge most sharply.  While Intel has pinned its hopes for 64-bit computing on expensive, radical and quirky IA-64 technology, AMD has chosen to utilize existing x86 technology and extend it into 64-bit range.  AMD’s reason for doing so are simple: it’s stupid to throw away decades of code and the knowledgebase that comes with it. 

As for the desktop , AMD believes these users are not likely to need much in the way of 64-bit computing for the next few years, so 32-bit performance is extremely important.  Unlike Intel’s Itanium, whose kludgey 32-bit code performance is just slightly faster than your average dead turtle, the AMD Hammer has been designed to provide high levels of performance on both types of code.

Furthermore, it is simply unlikely that there will suddenly be a sea change away from 32-bit code to 64-bit applications overnight – existing 32-bit systems such as the AMD Athlon XP will constitute a majority of deployed systems for several more years to come.  Therefore it is important that the Hammer run the huge amounts of legacy software still present in the marketplace.  As much as modern tech-lovers gnash their teeth at the thought, there are still a large number of systems out there running Windows 95.

Hammer Speed Grades and Performance

AMD was unwilling to give details on what speeds the Hammer would debut at next year or what its performance would be, but the gentlemen I spoke with did inform me that the chip was running SPECINT2000 numbers twice as fast [ed: hmm… “running” sounds like they’ve got samples – strange about those “rumors” of an early Hammer] as the fastest x86 platform today.  Needless to say, this is an extremely well-performing processor if that is true. 

Hammer chips will not be marketed under Model Numbers, but will be labeled under the True Performance Initiative Standard.  For those of you who have not heard of it, the True Performance Initiative is the new standard of performance that AMD is attempting to formulate with a large number of other industry personnel.  The Model Number ratings that have been so controversial of late are only the first step in moving towards TPI, and TPI will be a rating system that many major players in the tech market will have input on. 

The goal of TPI is to give consumers a more inclusive and comprehensive system for expressing a system’s total performance rather than using MHz [ed: judging system performance by MHz can be like claiming a Chihuahua is ten times faster than a race horse because it takes ten times more steps per second] or even Model Number ratings.  While AMD believes its Model Numbers are more transparently indicative of its processor true performance, the company wishes to continue to enhance the process and make the rating system easier for consumer’s to understand while establishing objective, independent verification.

The Pentium 4’s Achilles Heel — SSE2 Compatibility

Although AMD did not dwell on this point much, it’s a definite fact — the K8 will be fully SSE2 compatible.  This could be a nightmare for Intel, as some of the few Pentium 4 performance wins depend on SSE2 optimizations that the Athlon core currently cannot exploit.  With the K8 is already set to improve on the K7’s efficiency, clock speed, and IPC ratings, harnessing the SSE2 instruction set to it  – and with double the number of SSE2 registers than the Pentium 4 – could be the stealth weapon in the K8’s arsenal that smashes the P4 into the ground.  (note:  Use of the phrase “Hammer” the P4 into the ground was carefully avoided.)

Still, a great deal can change in a twelve months.  Intel’s roadmap shows that by this time next year we should be seeing P4’s with 512 KB of onboard cache running around 2.5 GHz with other possible improvements [ed: like green shag carpet on the dashboard – err, no that’s my car, nevermind].  It is likely that Intel will begin shipping P4’s with Jackson Technology [ed: as in pop star Michael Jackson -- err, no, as in a technology that provides for symmetrical on-die multithreading which could boost performance in some cases by as much as 20-30%] enabled within this timeframe.  Reportedly this technology currently exists on all P4’s, but is disabled.  One thing is certain — Chipzilla will not sit idly by and watch the K8 crush its empire, be it flagging or not, into dust.  

Conclusion

Within a year the first Hammer desktops should begin arriving.  AMD has indicated that they will push 64-bit technology into the server / workstation / high performance desktop market first and expand into mainstream and value segments relatively quickly.  This is in stark contrast to Intel’s IA-64 strategy which does not focus on the desktop at all.

AMD is betting on easy upgradability, 32-bit compatibility with class-leading speed, and excellent 64-bit performance.

For the same markets, Intel is betting on a new, unproven architecture (but an architecture with a number of brilliant minds behind it – albeit a committee of minds suckled on Intel’s famous bureaucracy).   Intel is also counting on their customers’ willingness to upgrade to a new platform while abandoning investments in existing applications.  The chipmaker continues to ride the Pentium 4 (to include whatever improvements are made to this chip), but perhaps most potent is its considerable marketing might leveraged into established channels.

Both companies have risky strategies and both have strengths and weaknesses, but if their roles were reversed we would say that AMD would be insane to produce Itanium, while Intel would have a clear path for dominance with x86-64 and Hammer.

It’s going to be another interesting year.

===================================

Pssst!  Our Shopping Page has been updated.

===================================

[RETURN HOME]