[RETURN HOME]

Benchmarking Transmeta's efficeon

By Van Smith

Date: April 4, 2004

Transmeta’s new flagship processor officially debuted in the United States last week with the introduction of the Sharp Actius MM20 “thin and light” notebook.  We were able to subject the tiny notebook to a battery of tests and what we found is very surprising.

===================================

The Sharp Actius MM20

A marvel of miniaturization, our Sharp Actius MM20 features a 10.4” XGA (1024x768) bright and clear LCD display, a Lilliputian 20GB Hitachi hard drive, integrated 10/100 and 11b/g networking, a PCMCIA Type II card slot (which the modem has to use), two USB 2.0 connectors, 512MB of PC2100 memory (soldered down to the main board and not expanable), an ATi Mobility Radeon and, of course, a 1GHZ Transmeta efficeon microprocessor.

 The bundled docking station converts the two-pound, 0.62” thick notebook into a USB 2.0 hard drive.  This innovative feature works well and obviates the need for a full docking station.


The Sharp is dwarfed by the full-sized HP.

The drastically shrunken keyboard is cramped and, consequently, relatively difficult to use, but this is a tradeoff for the degree of portability that the little Sharp offers.  Key travel is correspondingly limited, so 120 words-per-second touch typists will likely be put off by the Actius.  But the one-spindle Actius was not designed for use as a primary computer, but to augment other systems when extreme portability is required.

 

Sharp spared little expense in the Actius design.  For example, MM20’s very thin motherboard uses “blind vias,” a pricey method to connect to buried traces by only partially drilling through the PCB.  Although expensive, blind vias allow for higher surface mount component densities.

===================================

The Transmeta efficeon TM8600

Transmeta’s first break into the x86 MPU marketplace was with its Crusoe line of processors.  The immediate predecessor to efficeon, Crusoe was rife with limitations.  We enumerated Crusoe’s weaknesses in a previous article, but for completeness, we list them here again. 

For the most part, the Transmeta efficeon admirably addresses the bulk of Crusoe’s shortcomings.  Dubbed “Astro” in development, The Transmeta efficeon advances beyond its predecessor in the following important respects:

 

tm5800

tm8600

chipset interconnect

shared PCI bus

400MHz HyperTransport

graphics interconnect

shared PCI bus

AGP 4x

L2 cache

512kB

1MB

memory controller

SDRAM + DDR SDRAM (limited to 1 DIMM)

DDR SDRAM

clock speed

up to 1GHz

Up to 1.2GHz)

VLIW instruction length

128-bits

256-bits

maximum number of 32-bit instructions per clock cycle

4

8

MMX

yes

yes

SSE

no

yes

SSE2

no

yes

die size

55 mm^2

119 mm^2

 With so many clear-cut advantages over the TM5800, the TM8600 should rocket ahead of its ancestor in performance and it will need to do so because the enormous and costly die size of the TM8600 means that it will have to compete with the “big boys” like Intel’s Banias and Dothan, AMD’s Thoroughbred and Barton and even Intel’s Pentium 4.  

===================================

Die Size Comparison

 How big is efficeon?  The chart below tells the story.


Only the P4-Northwood is bigger than efficeon

With such a relatively enormous die size compared with its predecessor’s, we project that the TM8600 efficeon’s costs are 4-6 times higher than its Crusoe compatriot (fabrication costs rise along an exponential curve as a function of die size).  

In fact, the TM8600’s costs are probably higher than even Intel’s Northwood Pentium 4, since the latter uses mature fab technology owned by Intel, while Transmeta employs a third party fab in the form of TSMC.  Such high fabrication expenses will force Transmeta to position efficeon against Intel’s successful Centrino line and AMD’s powerful but cheap Mobile Athlon XP.

 At ~12mm x 10mm, the efficeon’s die is so large that it is comparable to the 15mm square package size for VIA’s nanoBGA C3. 

===================================

Something Fishy About efficeon

While testing the new TM8600, we became frustrated with the high degree of run-to-run result variability that we were seeing.  For instance, our Sandra 2004 Whetstone results were: 900, 810, 768.  We initially assumed that this problem was due to nasty “Code Morphing” artifacts, but we noticed that the trend always seems to be downwards.

On a lark, we took out an old COSBI tool, FibBurn, which repeatedly times a Fibonacci sequence and graphs each iteration.  What we saw, virtually immediately, was textbook throttling!

CPU “Throttling” is a technique for combating thermal meltdown by slowing down the throughput of a processor.  We were the first to educate the public about CPU throttling several years ago when Intel quietly introduced it inside of the Willamette Pentium 4.

Truly surprised to see that such an undemanding integer algorithm could cause efficeon to throttle in a matter of seconds, we expanded FibBurn to include two other OSMark tests, Whetstone and Ray Lishner’s pi calculation algorithm.  We make available this hacked together tool here (note, this program is offered without warranty of any kind; use at your own risk).

All three algorithms easily triggered throttling in the Transmeta TM8600 efficeon as is clearly evident in the graphs below.


Textbook throttling: Note how the Transmeta efficeon’s performance gets slower over time.

Ray Lischner’s pi computation routine slowed down by more than 1.5 seconds over time on efficeon:

Perhaps the most distressing sign of throttling was from Whetstone where the efficeon ramped down frequency almost immediately:


efficeon throttles almost immediately under Whetstone!

This last result is the obvious reason why Sisoftware Sandra’s Whetstone test degraded with each run: the Transmeta efficeon was throttling!

In case you were wondering what behavior “WhetBurn” produces on other systems, the graphs are most often dull, flat lines that persist throughout the test.  Below are results taken from this Athlon XP 1800+ laptop.  The EKG-like blips are from background tasks that I did not shut down.  If you examine the scale on the left, even these bumps are relatively tiny.


The Mobile Athlon XP is essentially invariant regardless of how long “WhetBurn” runs.

It is our standard practice to report the best out of three (or more, depending on the test) benchmark runs as our official results.  Even though with efficeon’s throttling problem this might unfairly make Transmeta’s chip look much better than it actually performs under normal usage, we will continue this practice throughout this report.

===================================

Benchmark Results

Our first two tests are simple spreadsheet simulations.  Although the Sharp Actius MM20 is equipped with a relatively robust ATi Mobility Radeon, its performance is not very good, falling behind a 1GHz VIA C3 system using the integrated graphics core of the VIA CN400 chipset.  Although influenced by graphics controller selection, this test scales well with CPU performance as is demonstrated by the scores between the 1.33GHz C3 and the 1GHz C3 where the only difference between the two systems is the software defined clock speed.

The Sharp’s laggardly position appears to be solely attributable to the Transmeta efficeon.  Although a big jump ahead of the 1GHz TM5800 Crusoe system, the efficeon looks slow compared to its Intel and AMD competitors.

As we have detailed in previous articles, the Digital Signal Processor-like architecture make them well suited for many small benchmarks.  On the Fibonacci test, the efficeon’s relative performance is a tiny bit better, but only a small advance beyond the TM5800.

The n-body test below is a simulation of 4 bodies interacting through an inverse square law, confined to a plane.  The orbits are calculated using Euler’s Last Point Approximation.  “n-body” is extremely floating point intensive.

The efficeon’s FPU is not particularly fast, but it does appear to surpass the Crusoe’s.  Still, it is a lot slower than the FPU integrated into the much cheaper Mobile Athlon XP 1800+.

On trigonometric tests, the efficeon fares similarly.

You might be surprised by how strongly the VIA C3 performs in the Trig Curves test, especially compared with how poor it is in the Trig Curve 2 test.  The difference between the two benchmarks lies in the fact that the first test plots every computation whereas the second test only plots one out of every hundred steps.  While this makes the “Trig Curves” much more sensitive to 2d performance, the real reason why the C3 looks much better is because its FPU is “uncorked” on this first test. 

On low level tests, clock-for-clock the C3’s FPU is actually faster than the P4’s, but because of pipeline limitations imposed by its simple, scalar, in-order design it is very easy to choke off this performance with blocking pipeline stalls.  The Trig Curves test allows each FPU instruction to proceed without blocking the next.

Although predominated by video controller throughput, the following 2d-tests are also CPU sensitive.

Again, Transmeta efficeon fails to impress despite its relatively powerful ATi Mobility Radeon.

Speaking of ATi, you might be as surprised as I was to see my HP ze4315 Mobile Athlon XP notebook outperforming the Athlon 64 3200+ system on two of the tests.  The reason for this is that ATi products scream through some 2d tasks.  While my notebook has a modest ATi Radeon IGP 320M north bridge with integrated graphics, it whips the Athlon 64 system using an nVidia 440MX AGP card.

Both Intel platforms are using Intel “Extreme” integrated graphics, which appear to have adequate 2d performance.

Despite having a good 2d core, efficeon – and all Transmeta products – performs badly on these tests.  The reason for this will be made clear very soon.

An example of artificial intelligence, the Maze test generates and then solves complicate 2d mazes.  Efficeon stinks on it, being slower than even its Crusoe predecessor.

AMD processors shine on Maze (except for the GX2) which no doubt indicates one reason why they are so good for gaming.  The P4-Celeron does not like branchy code, a phobia that it shares with all Transmeta products.

Fern Fractal is another floating point intensive test, and the AMD processors predictably clean up on it because of their bully FPUs.  Compared with Intel’s Banias, the efficeon looks pretty good, but it is still performing only slightly faster than Crusoe.

The RichEdit test is a word processor simulation based upon Windows’s Rich Edit control which is the foundation for many word processors like Microsoft Word.  For most systems, this test is one of the most video controller limited of all, but the TM5800’s painfully bad showing is likely due to the processor since its discrete ATi Rage XL is not that bad.

Although the Athlon 64 looks to be embarrassing the rest of the field, I have observed that nVidia cards often cheat on this test by rendering the first few lines, then giving up and hopping to the end and rendering the last few sentences.  If you are trying to scan through a document, this is not a good optimization decision.  But it does make for good benchmark numbers.

Thank to its ATi Mobility Radeon, the efficeon looks good next to the Athlon XP 1800+, but slips farther behind Banias which is itself beaten by the VIA C3-CN400 combo.

Our Pi Test is based upon Ray Lischner’s famous pi computation routine.  Mr. Lischner gave us permission to incorporate this code into our benchmark.  The Pi Test is the first benchmark where the Transmeta efficeon really shines.

Despite its clock speed disadvantage, efficeon almost matches both the Mobile Athlon XP 1800+ and the Intel Banias 1400.  “Astro” is also much faster than Crusoe on the pi calculation test.

Two stalwarts from the world of benchmarking, no performance profiling regimen would be complete without Dhrystone and Whetstone.  While the efficeon is not humiliated in Dhrystone, compared with either the Athlon XP or Intel Banias, it is quite weak in Whetstone.

Now we arrive at the memory tests where we will discover one big reason why the efficeon is so poor in the Maze test and why it is going to be such a very bad choice for gaming or office applications.

Efficeon is pretty strong in terms of memory bandwidth, shown above with the blue bars.  Transmeta’s last best hope almost keeps up with Intel’s Banias, which, again, is using Intel’s integrated chipset. 

Only the Transmeta and the Athlon 64 system are using discrete graphics.  All of the other systems would see a jump in bandwidth if coupled with discrete graphics cores, so efficeon’s relative parity is a little misleading.

Nevertheless, take a close look at the red bars.  Despite its integrated DDR SDRAM memory controller and discrete graphics core, the efficeon is very weak when handling randomized memory accesses.  This is a shortcoming it shares with all other Transmeta processors and the P4-Celeron.

Weakness on our memory latency test is a very good predictor of poor performance on business applications, so efficeon will be quite sluggish in many of the applications that the typical user exercises.

It is worth commenting on the performance of the AMD Athlon XP 1800+ and the Athlon 64.  Again, the Mobile Athlon XP uses an integrated ATi north bridge.  While providing very good 2d performance, ATi’s integrated solution does so at the expense of CPU memory performance.  In other words, ATi’s IGP hogs the memory for its integrated graphics core.

The Athlon 64 truly embarrasses the field in the memory latency test, and this time the win is for real.  The Athlon 64 is a true monster when it comes to randomized, fragmented memory accesses thanks to its highly efficient integrated memory controller.  This is a big reason why the Athlon 64 is so dominating in gaming benchmarks.

Next we come to our threading tests.  We chose each test in order to test examine three dramatically different threading conditions.  Unless an SMP/SMT/multicore design is egregiously hosed up, all such systems should love the Orthogonal Thread test where neither code nor data are shared between the threads.

Most MP systems should also scale well with Identical Thread test, whose name is largely self explanatory.  However, by sharing both code and data, interprocessor communication is stressed and if the MP system is not fully fleshed out, this can result in contention between the two processors for cache lines which will degrade performance.

Most MP/SMT systems freak out when confronted with the Maze Thread test which simply spawns two Maze Thread tests.  Performance often plummets with the addition of the second processor.  The Intel P4, for example, is only about 1/3 as fast on this test when HyperThreading is turned on.

The efficeon doesn’t look particularly good on any of the thread tests, but, predictably, fares worst on the Maze Thread test.  The efficeon is not bad on the Orthogonal Thread test which spans PI and Fibonacci treads.  It also comes close to the Mobile Athlon XP system in the Indentical Threads test, which creates two 5,000,000 element quick sort threads.

Belying its DSP-similarities, the efficeon is often good in decoding tasks, but it is still not particularly good in the JPG decoding test.

The efficeon advances well beyond the Crusoe it supplants on the two image manipulation tests below.  Its Image Resize results look very good, but this test is largely a function of the graphics core.

The efficeon breaks away from Crusoe on our MP3 encoding test which uses the popular open source Japanese GoGo MP3 encoder.  However, the efficeon’s performance pales compared with its Intel and AMD rivals.

The Transmeta efficeon is pretty poor on our Web Page Load test which instantiates Internet Explorer and loads up a handful of web pages copied from our site.  This highlights the branch averse nature of all Transmeta products.  Despite its advantage in JPG decoding, the efficeon slips greatly when compared with even VIA’s chips.

The efficeon does not present much of an advance beyond Crusoe when zipping files.

Security has become very important in our insecure world and the single most important encryption task is AES encoding/decoding.  Our AES Encrypt/Decrypt test below makes one obvious impression.

The mite-sized, inexpensive, humble VIA C3 obscenely massacres all comers.  I know of no other performance area where a CPU demonstrates such a mind boggling lead.  The results above show that when VIA says it has an “Advanced Cryptography Engine” it really means it.  This kind of quantum leap forward in performance is reminiscent of the time when the first FPUs were integrated into general purpose microprocessors.

Still, if you take a magnifying glass and look really hard, you’ll see that efficeon is actually doing very well on this test compared with the Athlon XP and Intel Banias.

We will skip the file copy test since the efficeon was at a big disadvantage with its lil’ Hitachi 20GB notebook drive.

On the overall OSMark scores, two things stand out.  The first is the ridiculously huge encryption/ decryption performance lead put in by the VIA C3’s ACE unit.  The second obvious item is that the efficeon really looks pretty bad for being Transmeta’s 119mm^2 flagship.  Efficeon is just slightly better than half as fast as the two mobile AMD and Intel processors in this report.  In fact, efficeon is no better than the VIA C3 even when discounting that chip’s elephantine encryption/decryption lead.

As a final comment on the performance numbers, Banias fans might be surprised to see the lowly Mobile Athlon XP 1800+ usurping the much more expensive Banias 1.4GHz.  The Mobile AMD Athlon XP line of processors are a remarkably unappreciated family of mobile processors.

===================================

Conclusion

The Transmeta efficeon is that company’s last, best chance for survival.  Moreover, it represents the acme of ideologically pure VLIW development.  Featuring a laundry list of very real architectural advances beyond the Crusoe, efficeon raised the hopes of many that Transmeta had finally turned the corner on performance.

Unfortunately, the efficeon is a staggering failure by nearly every measure.

Performance is unambiguously lackluster.  In fact, efficeon is only slightly faster than Crusoe.  If it weren’t for the other Transmeta products and the 366MHz AMD Geode, thrown in for comic relief, the Crusoe would be dead last even when compared to the miniscule VIA C3.

For this slight performance nudge beyond its predecessor, Transmeta pays for it with a die size that is over twice as large and at least four times as expensive.  And the design will hardly ramp beyond 1GHz, a glacial speed compared with all other modern processors.

As the fat cherry on top of this black sundae of disaster, the efficeon thermally throttles at the slightest provocation.  Unlike Intel’s P4 throttling apparatus, the efficeon leverages “LongRun” to actually reduce clock speed and voltage.  While this is a superior approach, it still means the same thing to end users: reduced performance.  And efficeon significantly throttles back even under relatively light loads.

We suspect that the reason efficeon throttles so horrendously is because Transmeta greatly desired to reduce its recommended Thermal Design Power so that it could secure fanless 1GHz designs.  The problem is that as soon as you start needing that 1GHz that you paid for, the efficeon is throttling down to 933MHz, 800MHz or even lower.

In other words, the 1GHz fanless efficeon appears to be outright hucksterism.

Transmeta projects that the maximum clockspeed that the 0.13-micron efficeon will achieve is 1.2GHz and, beyond any length of reason, positions this chip against 1.7GHz Banias and 1.8GHz Dothan!  Well you can’t claim that Transmeta’s marketing folks are short on outrageous audacity.

When it comes to unmitigated catastrophes in the history of computing, Intel’s Prescott still owns the title, but efficeon trails closely behind.

From nearly their inception, Transmeta has been trying to turn their vices into virtues.  For instance the name “efficeon” is a profound ironic turn of logic considering how very inefficient this processor is.  Using over twice the die size of its predecessor, the efficeon barely manages to squeak beyond the Crusoe in benchmarks, yet can’t ramp to significantly higher clock speeds either.

Transmeta has been creative in inventing markets for their products, but the efficeon will be a hard sell anywhere.  One area where Transmeta has found success with Crusoe is in the burgeoning thin client arena, but Transmeta dare not push efficeon into this space because it will be forced to collide head on with much cheaper processors that trounce it in performance.

The single most important metric in thin clients is Remote Desktop Performance.  The graph below shows how efficeon fares against VIA’s C3.

Lastly, Transmeta appears to be significantly overoptimistic in its projections for its 90nm shrink of efficeon where they are telling customers that they will reach 2GHz by the end of the year.  Folks, it ain’t gonna happen unless Fujitsu has jujitsu in its process tech.

Transmeta states that there are no significant architectural changes between the current 0.13-micron efficeon and the 90nm Fujitsu part.  They project that the current 0.13-micron efficeon will top out at 1.2GHz while the 90nm efficeon TM8800 will jump ahead to 2GHz.  VIA’s C3, using the essentially same TSMC 0.13-micron process technology, will easily reach 1.5GHz.  VIA is projecting that C5J, a substantially improved C3 fabricated at IBM East Fishkill using the most advanced 90nm SOI technology in the world, will hit 2GHz. 

Intel’s Dothan, Banias’s followup, will be puffing hard to reach 2GHz even though Banias tops out at 1.7GHz, which is way, way higher than the 0.13-micron efficeon.

So where does efficeon’s magical megahertz come from at 90nm?  Count us with you on that question.

To end on a positive note, Sharp’s Actius MM20 is a dandy little thin and light, despite its lethargic, throttling Transmeta efficeon baggage.

Finally, we will release a late OSMark beta next week for those interested in previewing what we believe is the most useful benchmark suite around.  Source code will be released under the GNU General Public License with the official launch, which will occur soon after the beta preview.

===================================

===================================

[RETURN HOME]