Essex Junction (VT) – A statement from a senior IBM engineer buried deep within a Q&A with veteran journalist Ed Sperling,published last week by Electronic News, casts a sharp ray of light on an otherwise undiscussed topic: defects in the course of processor production. Defects, IBM vice president of semiconductor and technology services Tom Reeves admitted, crop up in about one in ten processors – specifically, digital ASICs – that are fabricated, and weeding those defects out is part of the everyday work of producing chips. But with today’s multicore chips, that defect number is compounded as core counts grow. As a result, Reeves told Sperling, as few as one Cell processor for every ten fabricated may be defect-free upon inspection.
With standard silicon germanium (SiGe) single-core processors, IBM can achieve yields of up to 95%, Reeves told Electronic News. But “with a chip like the Cell processor,” he then remarked, “you’re lucky to get 10 or 20 percent.”
But Reeves went further, making a comment that is raising the eyebrows of many game console enthusiasts who had thought the sole purpose of multiple cores in the Cell processor for Sony’s upcoming PlayStation 3 was to improve performance: He implied that because the Cell uses as many as eight identical synergistic processing elements (SPEs), but Sony only requires the use of seven, some production units could, in fact, get away with one core in eight being defective without any impact on the customer.
It gets better. Reeves stated outright that we’re entering an era of redundant logic, which enables manufacturers to produce processing components that compensate for their own defects. With such systems in place, he said, yields could conceivably increase in a best-case scenario to 40% – still significantly lower than the 95% yields that IBM and others enjoyed during the single-core, “one-by-one” era. The picture that emerged from the Electronic News interview is one not of multiple, powerful processor units pounding out code in parallel, but instead a kind of “RAID array” for the CPU, where unit failure could be considered part of everyday life.
Is this guy serious? We asked Insight64 principal analyst Nathan Brookwood. “He is serious,” he told TG Daily. “Yields always go down as chip size increases, so designers of large chips often use redundancy to increase yields. Memory chips have done this for years, as have the cache blocks on CPUs, but it’s harder to design redundancy into logic circuits – unless you replicate the entire logic block, which is what Cell does. Sony needs to balance performance, cost, and availability, so it makes sense that they would sacrifice a core or two in order to get lower cost or more useful chips.”
By far the biggest single application for the Cell processor, in terms of acquiring installed base, will be its introduction in Sony’s PS3 this November. In its quarterly report last April, Sony told investors it intends to sell 6 million PS3s between November 2006 and March 2007. If this is indeed the case, borrowing Reeves’ numbers, the IBM/Sony/Toshiba joint effort (STI) will need to fabricate at least 15 million Cell processors, and toss out 60% or more of those units after fabrication. But even then, it would appear to be a safe bet, based on Reeves’ logic, that about half the number of processors that complete the full production cycle will have one SPE unit that’s defective. Since PS3 will only use seven of the eight SPEs anyway, the user should not know the difference.
IBM’s engineering division for Cell was contacted by TG Daily for comment yesterday, and has yet to return our inquiries.
Are Cell’s SPEs really redundant logic units?
Whether Reeves’ logic holds up depends in great measure on whether each SPE in a Cell processor could be considered a redundant part. In an August 2005 interview with Cell’s principal designer at IBM, Dr. H. Peter Hofstee, he explained to us in rather extensive detail the differences between a synergistic processor element in a Cell and a core in an Intel or AMD processor. A Cell is comprised of a single principal processor element (PPE) that is essentially a current generation Power processor. It is not replicated. The other processing elements – the SPEs – are there to handle what is called scalar code: tasks that involve repetitive and reiterative operations, such as shading a texel or dividing a complex number. The SPEs, it was made clear to us, are not replicates of the PPE.
One term used to explain the instruction set the Cell uses is single-instruction/multiple-data (SIMD). Essentially, a SIMD instruction applies a logical operation to multiple sets of data, and those multiple sets can then be processed independently in a processor geared to handle scalar code. A GPU accomplishes this by implementing multiple pipelines – Nvidia’s 7900 GTX utilizes 24 pixel pipelines and 8 vertex pipelines. Cell accomplishes this using SPEs. However, as is the case with graphics processing, the data flow itself has not been multiplied. In other words, the processor is still doing one thing, but just breaking up the steps in-between and delegating them to the SPEs.
“One common perception that I think is not accurate,” Dr. Hofstee told us at that time, “is that, because the synergistic processors have a single data flow – which is SIMD – a lot of people seem to think that you can only SPEs appropriately for problems that are SIMD parallel. I think that is a misperception.” In Cell architecture, there are no multiple caches for the multiple SPEs, nor are there multiple register sets. Instead, there’s one big single-register set with 128 registers that can be accessed by all the SPEs at all times. And replacing the multiple caches are something Dr. Hofstee refers to as a local store, which is a trigger for a three-tier memory architecture that lets the SPEs access a single, bigger pool of memory.
“The reason we went to the local store, three-level memory hierarchy – registers, local store, and shared memory,” Dr. Hofstee explained, “is something called the memory wall: the fact that microprocessors have gotten faster by a factor of 1000 in the last 20 years, and latency hasn’t gone down all that much.” He referred to a principle named for Intel engineer Pat Gelsinger, called “Gelsinger’s Law:” a corollary of Moore’s Law that states that, for every time the number of transistors is doubled on a processor, it delivers not double the performance but instead just 40%. It was this “law” which helped drive Intel – and AMD – toward multicore architecture in the first place.
Clearly the goal of multicore architecture from this vantage point was not to create “redundant logic,” but moreover multiplied logic – a way of doubling the horsepower and achieving something closer to double the performance. But as Dr. Hofstee explained, those portions of the processor which a manufacturer chooses to replicate, could easily end up contending with one another for priority when it comes time for them to share a single computer system. Case in point: when two cores want the same area of memory.
“When you have a miss and you have to wait for memory, I sort of compare it to this ‘bucket brigade,'” remarked Dr. Hofstee. “You might have 100 people in the bucket brigade between the core (fire) and the memory (water), but if you only have five outstanding fetches – a bucket brigade with 100 people and five buckets – it just isn’t going to be very efficient. So in conventional microprocessors, if I take a conventional microprocessor and I double the memory bandwidth, I might only see a very incremental performance improvement, because in fact, delivered memory bandwidth is limited by the latency induced.” In other words, if you replicate everything, you create new latencies when everything has to work together. Therefore, you don’t replicate everything – not in smart processor architecture.
So what does this mean, applied to the numbers IBM’s Tom Reeves provided last week? Certainly there are replicated elements in the Cell processor, and we’ve learned that even a horsepower-hungry PS3 doesn’t need all of them. But if two SPEs aren’t really the same as two cores, then perhaps it should not follow that the number of defects to be anticipated should necessarily double, or otherwise be compounded by the number of SPEs. In short, eight SPEs should not necessarily mean eight times the number of defects, any more than doubling the number of transistors (in accordance with Moore’s Law) should reduce the yield rate below the 95% mark which foundries, at one time, enjoyed.
Thus if the Cell processor really is lucky to see 10% to 20% yields, as Reeves indicated, then if you take Dr. Hofstee’s explanation into account, there must be some other reason for it. Nonetheless, “the real surprise here is that Reeves gave an estimate of the actual yield,” Insight64’s Nathan Brookwood told us. “Semi guys normally won’t say anything about yield, on or off the record.”