How a forgotten Intel invention could revolutionize the CPU

Posted by Theo Valich

Mountain View (CA) - When we talk about processor performance, most of the performance typically comes from the depth of the pipeline, the number of cores, the size and the type of the cache or the   clock speed. However, we rarely hear about the way how a processor actually communicates between these components and such technologies usually do not make it into marketing brochures. But Intel has an idea that could change this scenario: The company plays with the thought of integrating DRAM into the CPU.

 

One of the most important goals when designing a new chip is to keep the available processing units as busy as possible. One way to achieve this goal is to feed enough data into the cores as quickly as possible through improved inter-core communication. The progress from one processor generation to another is obvious: For example, while the 65 nm Kentsfield quad-core provided a bandwidth of about 8 to 9 GB/s, the 45 nm Harpertown chip offers 18-20 GB/s.  
 
At last week’s Research@Intel Day event, we spotted a technology that holds the potential to multiply the available bandwidth within a processor. In our opinion, this technology is actually the most impressive research we saw on that day. The reason why you may not have heard about this technology is because Intel did not specifically promote it and did not even mention it on its  "Demo Cheat-Sheets" given out to journalists and analysts.

A small research team inside Intel succeeded in reducing the size of DRAM cells to only two transistors and completely removing the capacitors. Conceivably, these two achievements could change the way how we will use DRAM in the future: For example, expensive and complex SRAM (static RAM) cells could be entirely removed from a CPU and replaced with DRAM.

 

In contrast to Intel’s two-transistor (“2T”) DRAM bit cell, SRAM usually requires six transistors per stored bit. Of course, there is also 1T-SRAM (which uses only one cell), but this type is very rare (and used for example in Nintendo game consoles such as the GameCube and Wii).

SRAM has some advantages over DRAM, including lower power consumption, higher speed and no need to be refreshed. However, SRAM is known to be much more expensive than DRAM and not as dense.

Intel said that it was able to fine tune its DRAM design and hit a physical clock of 2 GHz using a 65 nm manufacturing process. The resulting 2T-DRAM offers a stunning bandwidth of 128 GB/s. If Intel is successful to take the clock speed up to the level of its QX9770/9775 processors, the bandwidth would climb to 204.8 GB/s. In other words: Intel would gain more than a 10x improvement over its current L2 cache technology. More importantly: This approach would completely change the programming model since there are no longer any concerns over cache misses.

The scientists believe they will be able to use 45 nm High-k technology to match and exceed Intel's existing clock speed design. And as a next step, DRAM cells are planned to be stacked into Intel's Terascale processors. The Terascale processor itself may be seeing a migration to a massive number of x86 mini-cores – which, sooner or later, may reveal the successor of the architectures of Larrabee and Itanium. In case you are wondering: Yes, it looks like there will be a combination of a CPU and the upcoming GPU/accelerator.

Seeing 32nm wafers at Intel’s Research Day was nice, but at the end of the day, 32 nm is just another manufacturing process. DRAM on the processor is actually what would make the greatest difference in performance in our opinion. According to two scientists we talked to, the potential bandwidth would quickly introduce us to the era of Terascale. If software developers can access a low-latency  200 GB/s bandwidth, many of today’s parallel programming problems could be resolved, since a cycle-miss could be reduced to near zero.