Pentium 4 Thermal Throttling
By Van Smith
Date: July 23, 2001
At InQuest Market Research, I was involved in producing two highly controversial articles discussing Intel Pentium 4 thermal throttling, "Can the P4 Recover?," and "The War Escalates: Athlon 4 takes on Pentium 4." However, neither of these pieces precisely reflected either my assessment of this subject or my motivation in writing about it. This article is intended to encapsulate a few points I wanted to draw attention to.
1. At the time, the P4's maximum power dissipation rate was being widely misrepresented.
This is a clear statement of fact that is now common knowledge. Some Intel supporters, while eventually forced to acknowledge this point, now suggest it is better to compare "typical" power consumption of one chip with "typical" power consumption of another. The problem here is that there are no independent guidelines set to determine what constitutes a "typical" workload, so comparisons from one manufacturer to the next are invalid.
Prior to the InQuest article, many reviewers, with Intel's guidance (I attended a press briefing where this occurred), compared the Pentium 4's Therrnal Design Power, a somewhat arbitrary value at roughly 75% peak power, with the Athlon's worst case maximum power. Not surprisingly, under these conditions the Pentium 4 bested the Athlon. However, contrasted correctly under worst case conditions, the 1.4 GHz Athlon dissipates 72W, while the comparably performing 1.8 GHz Pentium 4 consumes roughly 87-88W (power is calculated at Vcc_mid = 1.6625 V, Icc_max = 52.7 A) under the same circumstances.
Intel utilizes aggressive clock gating (effectively shutting off portions of the chip) with the Pentium 4 in order to reduce power consumption under typical loads. Under many tasks, this action does reduce power issues, but under load the Pentium 4 can still be demanding. In fact, worst case maximum power is usually defined by what is achievable through software. CPU vendors run such programs (usually closely guarded applications to help prevent the development of thermal viruses) as part of the validation process.
Some have pointed to the relatively low temperatures the P4 usually runs at as evidence to much lower power consumption. One contributor is certainly clock gating -- Intel quickly shuts down portions of the chip that are not being utilized -- which allows the chip to cool off quickly and not heat up as fast. Under load, the P4's temperature can ramp rapidly and track or even exceed the temperature of AMD chips. However, the Pentium 4 has an advantage from heat sinks much larger than those used on its Athlon rivals.
2. The P4's unique (in the PC CPU world) and elaborate Thermal Monitor exists.
The Pentium 4's Thermal Monitor can trigger a 50% duty cycle (the chip will run for about 2 us, and sleep for 2us) if a certain temperature is measured by its dedicated thermal probe. This thermal diode is independent of the legacy Pentium III diode exposed to monitoring programs. The Thermal Monitor's sensor is embedded in an area of the P4 die susceptible to "hot spots." This means that current monitoring programs cannot reliably give any indication when throttling will occur. According to Intel's P4 Thermal Design Guide, "The Thermal Monitor’s temperature sensor and the thermal diode are independent and isolated devices with no direct correlation to one another."
The Thermal Monitor must be enabled to be within Intel's specs, however some systems may override this with a chipset driven mechanism (see below). Intel's P4 Datasheet states:
If automatic mode is disabled the processor will be operating out of specification and cannot be guaranteed to provide reliable results. Regardless of enabling of the automatic or On-Demand modes, in the event of a catastrophic cooling failure, the processor will automatically shut down when the silicon has reached a temperature of approximately 135 °C.
The automatic shutdown facility is a legacy feature originating in the Pentium III.
Prior to the InQuest articles, the Pentium 4's Thermal Monitor -- and particularly its throttling features -- were absent from media coverage. Many Intel supporters even publicly denied the existence of it, but today the P4 Thermal Monitor is common knowledge.
3. Given the existence of the Pentium 4's Thermal Monitor, chip reviewers need to consider this variable when evaluating processors.
The P4 presented the possibility that performance could
actually be impacted by sustained load, workload type and thermal conditions.
This was a primary motivation for my wanting this information reported.
4. The P4 not only has automatic hardware throttling via the Thermal Monitor, but facilitates advanced ACPI triggered throttling.
Working in conjunction with the Thermal Monitor,
throttling can be imposed in steps of 12.5% by either a chipset mechanism or via
software. This can potentially be imperceptible to the eye and, unlike the
automatic hardware feature, can leave no traces making the phenomena very
elusive to detect.
5. The P4 has acute "Hot Spots," and it appears this is the reason why the Thermal Monitor exists.
If relying solely on a Pentium III style mechanism, the chip could fail or be damaged before the failsafe is triggered due to extreme temperature gradients across the P4 die. The P4's large die size as well as Intel's aggressive use of clock gating make the chip susceptible to localized overheating without the use of throttling. Since it is impractical to monitor all potential hot spots, throttling enforces thermal guard bands to bring a window of safety.
paper goes into detail about Pentium 4 hot spots and how the Thermal Monitor
is implemented to combat them.
6. I have witnessed what I believe is P4 throttling in Quake III.
The performance degradation was reproducible across several different GeForce2 Ultra graphics cards, each from a different manufacturer. The problem existed for one 1.7 GHz Pentium 4, but not for another. This problem was also localized to Intel D850GB motherboards (it occurred on both boards that I tested) -- the chip did not throttle on an Abit board.
The type of information listed above had not been discussed prior to the InQuest articles. I realize that the InQuest reports went beyond merely stating the issues and, in fact, inferred that Pentium 4 throttling is a rampant, everyday occurrence. This is not the message I wanted to disseminate, but the decision was not mine. It was my desire to simply present the data similarly as I have done here.
I must state here that there appears to be some capacity for almost instantaneous hot spot throttling in the Pentium 4. Quoting an Intel white paper on the subject:
First, the latency between critical temperature detection and power reduction should be low. In this case, low latency refers to periods on the order of 100's of microseconds. Reaction times significantly longer than this would allow the die temperature to potentially reach a point at which it no longer operates reliably.
I have not seen any definitive evidence for instantaneous hot spot throttling, but it seems to be a possibility under certain loads.
Though I may disagree with the tone of the InQuest articles, the Thermal Monitor does exist, the ACPI throttling capacity is there, I have witnessed what appears to have been throttling in one 1.7 GHz Pentium 4 while running QIII, and the maximum power dissipation for this chip was being misstated prior to the InQuest reports.
I believe the Pentium 4 that displayed apparent throttling behavior under Quake III was a part that slipped through a hole in Intel's validation process. This is a manufacturing issue, not a design issue and is therefore correctible through more exhaustive validation cycles. This is a very common occurrence that rarely is cause for alarm unless a large percentage of chips are impacted, which I do not believe to be the case here.
I suspect the slow emergence of 1.8 GHz Pentium 4 parts may be related to more intensive quality control measures. As Intel ramps the Willamette (the current 0.18 micron Pentium 4) to higher speeds, the likelihood of any part throttling under normal, demanding conditions will increase due to rising thermal demands. Chips that fail to qualify at 1.8 GHz will simply be binned down to lower speeds. Hopefully, for Intel, this does not vanishingly diminish their yields at these highest clock speeds.
Do I think that the Thermal Monitor is a bad "feature"? I believe that given the Pentium 4's expansive die size, its extensive use of clock gating, combined with Intel's desire to push the chip to higher and higher clock speeds all together make the Thermal Monitor's throttling feature a necessity. The Thermal Monitor would only be "bad" if it commonly impacted performance when the chip is under load. That does not appear to be the case at current Pentium 4 clock speeds.
However, the onus is upon us, hardware reviewers, to ensure that when we test the Pentium 4 -- and perhaps upcoming processors from other vendors -- we work under a methodology to ensure that we are exposing the chip to thermal conditions comparable to what our readers might subject their systems to.
Under these circumstances we need to investigate whether system performance is impacted by load under unfavorable, but in-spec temperatures. In the second InQuest article, I was apparently able to trigger throttling under typical, but rigorous Quake III test conditions on one 1.7 GHz Pentium 4. I will be sure to repeat this for all processors from now on -- it requires little extra effort.
Reviewing processors was already a daunting job, but with the P4 we now have another variable that must be evaluated, or, at the very least, acknowledged.
On a discouraging note, Intel has flatly refused to grant VHJ access to a 1.8 GHz Pentium 4 for testing despite our several inquiries. Budget permitting, we will test one of these processors as soon as possible.
Intel's actions, if they continue into the future, will place us at distinct disadvantages to other members of the media to whom Intel commonly distributes review products well in advance of release. This will likely result in our reviews of Intel's products being substantially later than those from other members of the media. We apologize for this, but it is beyond our control.