Santa Clara (CA) – Intel is gearing up for its Spring Developer Forum, which will open its doors on April 2 in Shanghai. Senior vice president Pat Gelsinger gave a first outlook what attendees will be able to see in China: Judging by this first presentation, Intel isn’t slowing down and will present a full line of new products ranging from a new Itanium on the high-end, to a six core MP Xeon, more details about the next-gen Nehalem micro architecture, first information about the succeeding 32 nm Sandy Bridge architecture as well as its discrete graphics card and FP accelerator Larrabee.
Intel’s IDF pre-briefings for journalists have a long history. Not only do they give us a better idea what to expect and plan for at the conference, they also help Intel to accelerate the buzz around the show. It’s not different this time around, though we do have to say that all that buzz created by the marketing folks may be deserved. Three years ago, we wondered if all that Pentium overclocking and Intel’s direction with its technology and IDF still made sense. Two years ago, the conference was revived with enormous efforts behind Core 2 Duo and signs that the giant was awake again. In 2008, this effort has spread virtually across platforms, showcasing the enormous research and development horsepower of Intel.
Larrabee: A supercomputer graphics card
One of the most anticipated topics certainly will be Larrabee, Intel’s first many-core product that will be released as a discrete graphics card in the 2009/2010 timeframe. The simple message is that Intel will be taking on AMD and Nvidia in the high-end graphics market, but the key difference to its rivals will be the use of regular IA cores. Just like in graphics cards, there will be lots of programmable processors in a highly parallel environment.
While Intel will pitch Larrabee as a graphics card (that is expected to consume less than 140 watts), the true power behind is its floating point engine, which, according to Gelsinger, will crank out more than a Teraflop. This puts the card head to head with Nvidia’s Tesla cards and AMD’s Firestream stream processor cards, but Intel believes it will have the decisive edge: Since there are IA cores, the platform can be leveraged by any developer who has been writing software for regular microprocessors already. There is no need to learn CUDA for Tesla or CTM for Firestream. In theory, Intel already has hundreds of thousands of developers who could write software for Larrabee. Gelsinger said that it will use “common libraries and run under the same OS as IA processors. “The industry believes that [Nvidia’s] CUDA is heavy lifting,” he said.
The challenge, however, remains, that Intel will need to educate a new generation of developers to be able to take advantage of an environment that deals with dozens of cores – a talent that so far has only been required to create applications running on supercomputers.
Intel will not be releasing specifications of Larrabee at IDF.
Nehalem: Core 3 Duo
No, Intel has not announced the official name of Nehalem processors, which will begin mass production in the fourth quarter of this year, but let us use the name for simplification reasons and besides that, Core 3 Duo would certainly make sense.
Nehalem is expected to be the main topic at IDF, representing the successor of the Core micro-architecture. Nehalem processors will scale from 2 to eight processing cores, support simultaneous multi-threading (similar to Hyperthreading, 2 threads per core), introduce the QuickPath interconnect, an integrated memory controller, a 3-level cache system and an extended instruction set (SSE 4.2). Performance should go up noticeably from Penryn, as Nehalem supports up to 128 in-flight micro-ops (up from 96 in Core), includes improved algorithms for faster unaligned cache access and faster synchronization primitives as well as extended branch prediction, which now includes a second branch predictor. Simultaneous multithreading will add a virtual thread to every core, which allows a 2-core system to handle 4 threads, while an 8-core Nehalem will support 16 threads.
The new cache system will include three cache levels: 32 KB L1 instruction cache per core, 32 KB D-cache, 256 KB low latency L2 cache per core as well as 8 MB shared ”last level” or L3 cache. There is an integrated 3-channel DDR3 controller offering “massive amounts of bandwidth” for DDR3-800, 1066 and 1333 memory, according to Intel. The Quickpath interconnect will provide a bandwidth of 25.6 Gb/s bandwidth between Nehalem processors and the Tylersburg chipset.
Later in the product cycle, Intel plans to offer a Nehalem processor with integrated graphics. Consider it Intel’s version of AMD’s Fusion processor: Gelsinger said that Nehalem has been built around a modular concept, allowing the company to scale the processor either towards performance or low power consumption. Compared to Fusion, the Nehalem+graphics CPU will not use a fully-fledged GPU, but a graphics chipset-derived engine that is not expected to reach the performance of the AMD GPU version. Gelsinger indicated that future versions of this concept could integrate graphics technology based on Larrabee.
Sandy Bridge: First 32 nm details
In 2009, Intel will shrink Nehalem from 45 nm to 32 nm in “Westmere”. 2010 will bring a new 32 nm architecture code-named “Sandy Bridge”. While this product is still far out in the future, Gelsinger revealed that Sandy Bridge will introduce a new instruction set called “Advanced Vector Extension” or short AVX. Vectors that increase in size from 128 to 256 bit will allow Intel to double the floating point output as well as pull and organize data more efficiently, Gelsinger said. Also, AVX will bring a “three operand, non destructive syntax” for developers, resulting in what Intel believes will be fewer register copies, better register use and smaller code size.
On the mainframe server side, Intel will demonstrate its next-generation Itanium processor, codenamed Tukwila. This 2-billion-transistor quad-core monster will integrate 30 MB of cache, 96 GB/s processor-to-processor bandwidth through “QuickPath” and a peak memory bandwidth of 34 GB/s. In terms of performance, the manufacturer promises that this new processor will offer about twice the performance of Montvale, while consuming about 25% more power (the CPU is rated at a 130 watt TDP).
Dunnington will complement the recently launched Tigerton 7300-series Xeon MP processors on the high-end. Scheduled for H2 OEM shipments, the processor is based on the 45 nm Penryn core and will integrate a total of six cores.
According to Gelsinger, Dunnington will house about 1.9 billion transistors, integrate 16 MB L3 cache and remain socket compatible with Caneland platform processors.
Click through the presentation slideshow: