Analysis: How serious are the bugs in Intel's Core 2 Duo?

Posted by Rick C. Hodgin

Chicago (IL) - Last week, Intel released a Core 2 microcode update to BIOS and OS vendors. The fallout from that release has many analysts scratching their head about the possible danger of Core 2 microprocessors. The big questions are a matter of trust: How can consumers trust a microprocessor which does not compute correctly? Let's take a look at all that entails.

As of today, Intel has publicly released information on 105 bugs in its Core 2 CPUs. The Core 2 microprocessor was officially introduced by Intel on July 29, 2006 - and even at that time there were at least 18 known bugs, which are called “errata” in CPUs. Errata indicate behavior that deviates from the published specs.

Errata range in scope from completely innocuous quirks to full-on system crashes and corrupted data. Of the 105 errata currently listed by Intel, only 4 have already been fixed in hardware. These fixes shipped in the most recent L2 stepping. Anyone with the previous B2 or B3 steppings will still have those errata. Only 32 of the 105 are planned to be fixed at some point in the future. That leaves 69 errata which Intel does not currently have any plans to fix. Some of those do have software workarounds and are not, therefore, considered as a requirement to fix in hardware.

The concerns many analysts and consumers have with this most recent microcode update are that the Intel chips are in some way flawed to the point where corruption is imminent for the user. While this is true under the conditions in which the errata surface, the truth is that many operating systems and compilers have, for a long time, included code which knows how to deal with these kinds of annoying quirks.

For example, when Intel released Core (the predecessor of Core 2) on January 6, 2006, it was a mere extension of the long-running basic P6 architecture dating back to the original Pentium Pro in 1995. As a result, even the most recent processor at that time (Core), one manufactured 11+ years after the original Pentium Pro, still carried with it many errata known for a long time.

Intel did not fix those long-standing, known issues because they were of no real consequence to general computing. In fact, Intel indicated on several of their errata that the behavior was only found in the laboratory and not in real-world software. Plus, there were software workarounds which would keep them from raising their head at all.

The fact is all hardware devices contain errata of some kind. And with CPUs, those errata only become notable issues when they affect existing software. And thanks to rigorous internal testing, the truly significant errors almost never make it out of the factory.

Consider the upcoming 45 nm Penryn chips from Intel. It was widely reported back in January that Intel had A0 silicon booting Windows. This means the first 45 nm production silicon they had worked well enough to boot Windows without crashing. It made big news world-wide and speaks very strongly of Intel's manufacturing capabilities.

I mention the January A0 silicon here to indicate how much lead time there is between initial creation and production ramp-up. From January until their release (sometime in Q4 for servers and desktops), these 45 nm chips will do nothing but be tested, fixed, tested again, fixed, and then retested after that in an endless cycle until they're ready. It is during these testing cycles that nearly all CPU errata are whittled away. CPU manufactures throw a barrage of tests at their products to make sure they never have something like a Pentium FDIV bug creep up again. The disastrous PR pummeling Intel took over that errata has taught the entire industry quite well.

Still, there is not a CPU in existence that does not have some kind of errata. And the average user running at at-home operating systems and software, doing mostly at-home things, will never see any kind of significant problems due to any errata which might happen to exist. In fact, it is far more likely that real software errors in the OS or applications will cause far more errors than would a faulty CPU.

Even for extreme users, those who max out their system's memory and have custom developed software, the errata almost always have a workaround. If the software developer took the time to code their specialty software with those workarounds, then it is unlikely any real errata issue would crop up and cause failures due to hardware. In fact, it's very rare.

To compare Intel's 105 known errata, with 32 planned fixes and 69 no fixes, we can also look at AMD's errata from their aging AMD64 technology. Both companies keep very good public relations up regarding errata data for their products. And, as of this date there have been 169 documented errata by AMD In AMD's case, all of them have either been fixed or are planned to be fix, save seven.

It is a very good thing to know just how many errors your CPU has. And both Intel and AMD make the process of knowing very easy. However, the fact that errors exist, does not mean your system will be unstable, process data under ordinary circumstances incorrectly, or that it should be anything you need to worry about. Most errata require very specific conditions before they exist. It has to be a certain sequence of things, or it has to be very hot. It could be the result of slow, fast or faulty memory, etc.

If these other contributing factors are there then yes, your CPU will fail you. But consider the complexity of the CPU itself, its testing, its lengthy in-house testing time between creation and production sales, and the track record of general computing success (no users crying foul because of some errata like the FDIV bug, for example). Then think for a moment about the countless experiences we've all had with buggy software. The real danger is our software, not the hardware. People make more out of errata issues than they need to.

In truth, considering the complexity of CPUs, it's truly amazing they even work at all. And the fact that there's just so much inside (hundreds of millions of transistors and miles and miles of wiring), the fact that there are only 105 errors discovered, many of which have software workarounds, speaks very highly of the process involved in taking the design from drawing board to end-product.

I will not hold any typical errata against Intel, AMD or anyone else. The testing policies and procedures in place are more than sufficient for most everybody. On top of which, both Intel and AMD are completely open about their errata because there's no reason not to be. Faulty equipment means less sales, and if you can document a software workaround for a faulting aspect of equipment, then that just turns a problem into another solution.