Intel squeezes 1.8 TFlops out of one processor

Posted by Wolfgang Gruener

San Francisco (CA) - Intel today revealed more details about its 80-core research CPU, which likely is the most powerful floating point engine built to date. 200 of these processors achieve the same floating point performance as today's most powerful supercomputer. But don't get excited just yet: The CPU won't come to market and the number of cores isn't everything to future CPU design, Intel said.

We first heard about Intel's "Terascale" project - an effort to enable the transfer of terabits rather than gigabits of data - not quite two years ago and did not make much of what sounded like science-fiction back then. But the company apparently has dedicated substantial resources to this project and was able to show a wafer with 80-core terascale processors and its most recent developer forum in fall of 2006. Just in time for the Integrated Solid State Circuits Conference (ISSCC), which will be held this in San Francisco, the company has more data to discuss the capabilities of the processor.

At 0.95 volts and 3.16 GHz - the clock speed that was indicated at the fall developer forum - the processor provides a data bandwidth of 1.62 Tb/s and a floating point performance of 1.01 TFlops, according to Intel. About ten years ago, Intel needed more than 10,000 Pentium Pro processors to achieve a similar performance. Even more impressive than the chip's speed is its power consumption: At 3.16 GHz, the CPU consumes 62 watts, which is less than the firm's current Core 2 Duo desktop processors and about half of the firm's 2.66 GHz quad-core Xeon X5355 processors (which are believed to provide a floating point performance of about 50-60 GFlops).

 

In terms of the popular performance-per-watt discipline, Intel said that its 32-bit 80-core processor achieves about 16 GFlops per watt.

Intel claims that it can scale the voltage and clock speed of the processor to gain even more floating point performance. For example, at 5.1 GHz, the chip reaches 1.63 TFlops (2.61 Tb/s) and at 5.7 GHz the processor hits 1.81 TFlops (2.91 Tb/s). However, power consumption rises quickly as well: Intel measured 175 watts at 5.1 GHz and 265 watts at 5.7 GHz. However, considering the fact that just 202 of these 80-core processors could replicate the floating point performance of today's highest performing supercomputer, those power consumption numbers appear even more convincing: The Department of Energy's BlueGene/L system, rated at a peak performance of 367 TFlops, houses 65,536 dual core processors.

According to Intel, the 80-core processor is constructed as a "network of computers on a chip". Each of the 80 tiles integrate two floating point engines as well as a router unit that controls the data communication with the other cores: There are four interfaces to connect to the cores on the bottom, top, right and left; a currently unused fifth "3D" connect is designed to enable the chip to communicate with "stacked" memory.

At this time, the CPU is built to explore floating point capabilities on processors that could open the door to new types of applications. Floating point performance traditionally has only been essential in certain segments of the market where number crunching capability is critical - such as in financial simulations and various scientific applications involving, for example, fluid dynamics or geological research.

Intel believes that a new generation of programmers adapting to multi-threaded programming could create new types of consumer software that will take advantage of this type of capability as well. The company envisions high-definition entertainment on PCs and handhelds, artificial intelligence in "user-aware" environments, instant video communications, photo-realistic gaming and multimedia gaming. There is also the idea of real-time speech recognition, which combines audio with visuals to achieve higher accuracy in this application: In the future, a PC could "read" a user's lip movements to improve the speech recognition we know today.

However, during a presentation, Intel dampened our hopes that such a processor would be in PCs or servers anytime soon. The company said that the 80-core chip is just a research chip that will not become a product for the commercial market. But, technologies that are developed within its terascale project could trickle down into mainstream products. According to Intel, the "manycore" design of the chip would be able to house different processor cores, including "general purpose" cores that are necessary, for example, to efficiently run traditional applications such as an operating system. In this view, Intel's approach has some similarity to AMD's design approach of the "Fusion" processor, which is expected to merge general purpose cores with graphics and cores. Intel, however, did not say whether it plans to integrate graphics capability into its 80-core chip.

 

The similarities with AMD's future roadmap go even further, as the company conceded that there may be a limit to how many processing cores on one die make sense. With the arrival of the first dual-core computers Intel often mentioned that the future will bring dozens or even hundreds of cores on one processor; now the company says that there could be a "sweetspot" for the number of cores on a chip. "At some point, the cores are getting into each other's way," an Intel representative said. "It's not just about adding cores. Other improvements are needed as well."

Specifically, Intel indicated that, in the current environment, processor will increasingly gain from the simple addition of cores until 16 cores are reached. After that, the baseline performance of processor will benefit less from the addition of cores and other enhancements will become more important. According to Intel, cache improvements will take the center stage, followed by thread scheduling and new instructions.

Both Intel and AMD apparently have come up with similar results for their future roadmaps and the differences will reveal themselves over time. What we already know today is that Intel is betting its money on CPU-like floating point engines and AMD will be taking the GPU-based "stream processing" approach. Nvidia will be another competitor in that field: The company is just about ready to release a public beta of its CUDA technology, which opens up the firm's graphic cards as stream processing engines and capable floating point accelerators as well.