Tech Talk Terascale: TG Daily chats with Intel CTO Justin Rattner
Hillsboro (OR) - TG Daily editor Rick C. Hodgin had the opportunity to sit down for a one-on-one interview with Intel senior fellow, Chief Technology Officer, and Corporate Technology Group director, Justin Rattner. In this interview, Rattner talks about Terascale, reconfigurable computing, software modeling, situational awareness and other future computing possibilities, also quirky nanoparticles called catoms. Terascale is receiving a tremendous amount of focus and funding in the research and development side of Intel. Can this future product provide the computing capacity necessary to change the way we think about and use our computers?
TG Daily: Terascale, from my point of view, is the one of the most fascinating developments in processor design today. Yet we do not hear much about this technology, with the exception of a presentation slide here and there. Why isn’t there more of a push from inside Intel to give this topic a more dominant stance in the media?
Justin Rattner: I think on the product side [of announcements] we’ve been pretty careful and deliberate with disclosures. To this point Pat [Gelsinger] has been able to get in a little bit [at IDF Spring 2007] in Beijing, for example. And Paul [Otellini] has said more than anyone else has on it. On the product side we’re really managing it. On the research side we’ve actually been as open and up-front about what we’re working on as we ever have been.
As you know, Intel up until a few years ago, we just never talked about anything that wasn’t product.
TG Daily: Exactly. The ‘Intel does not comment on unannounced products’ mantra.
Rattner: Right. I mean, we did not comment on unannounced products. Well, a few years ago we said, “Hey, you gotta talk about technology and the stuff you’re working on otherwise you’re just appear to be sort of out of touch with what’s going on”.
TG Daily: Like Yamhill. [Yamhill was the codename for an Intel technology back in the 2003 timeframe that was expected to be the answer to AMD’s 64-bit technology]
Rattner: Right. I think a number of people from the technology groups said that from a leadership point of view you just can’t maintain this radio silence the way we have traditionally in the past. So we’ve been more willing to talk about these things. Although, we still draw a pretty bright line between the work we’re doing, which is more along the lines of precursor work, versus the products which may differ to either a small or large degree, from the research. So, there is still a bit of that.
Read on the next page ... Terascale's flexibility in design, marketing and products.
TG Daily: A key strength that we believe to have seen with Terascale is its flexibility. You have an architecture now which, if you applied IA cores to it, in addition to other cores which augment through SIMD and MIMD abilities, then you’ll have significantly greater legacy support while you’re moving forward. What are the challenges you see right now?
Rattner: Yes, that’s just a fundamental principle of ours, that we don’t want to just throw out the legacy support, and to preserve that programming model across a large number of cores. That is not trivial. Let’s look at the cache, for example. Getting good and expected behavior from that cache requires a lot of innovation in how it’s done and how the function is distributed across the processors. Then there’s a programming challenge.
Many of us have High Performance Computing experience behind us, and we know that these machines can be very challenging. And right now we’re spending more than half of our research budget on Terascale on programming issues. There’s language compiler, tools, runtimes, and all of that, even more so than operating systems.
From a market perspective, we have realized that it’s just not possible to make a broad market entry. We can’t say "Just have at it, folks, and it will all, you know, just sort of work out." It doesn’t work like that. We are very conscious about identifying and understanding the requirements in the markets of interests, and the applications of interests, making sure the architecture addresses and protects those markets well. And then, introducing the products over time in a way that matches up with those markets - and keep the programming challenges tractable.
That being said, I have no doubts that once you can buy [Terascale] on the streets, there will be all kinds of third parties who develop interesting and powerful ways to use them.
TG Daily: Some of the Terascale demonstrations we have seen reveal phenomenal differences between what we have today and what is possible thanks to so much computing potential. And that’s likely to be especially true when Intel moves up to many more IA cores in future products like Larrabee. [Editor’s note: We were shown some interesting Terascale applications, which included various forms of digital effects for real-time 3D animation, multimedia, mining and graphical enhancement applications. These targeted applications would literally take hours of computing time with today's highest-end dual-socket quad core systems. These were demonstrated as applications for future Terascale products.]
Rattner: Part of the theory we had [with Terascale] was that, if you could deliver that kind of performance in a compact package, that you would open up a new range or a new class of applications. And the reason you were able to see all those things is because we actually focused on this aspect. You can say it was probably a bit non-traditional: Typically the presumption with processors is "There’s always going to be a market for more of these MIPS [Millions of Instructions Per Second] or FLOPS [FLOating point OPerations per Second - both measures of computing speed], so don’t worry about it."
TG Daily: This is what the market is looking for today...
Rattner: Yes. There’s a long standing joke - the marketing team shows up about a year before the product's introduction of the processor and asks the lab guys, "What really cool thing can we demonstrate with this?" And our answer is "Well, you should’ve thought of that before you started designing it." But in this case we needed to make sure that that there is a range of applications that are just not served by anything today. The reality is those applications exist. Performance wise, these applications are one or two orders of magnitude away from anything today.
TG Daily: Looking at the the eye appealing visual effects added to images, movies, audio and video we saw in your labs, isn’t it likely that developers would flock to Terascale once they saw that computing power realized with those kinds of examples?
Rattner: That is what we hope. But still, and especially in the Intel culture, we couldn’t just assert that fact. We actually had to show that, and quantify it and demonstrate by saying, "That application, we want to do that in real-time, or in super-real-time," and that’s just what we did.
Read on the next page ... Terascale's R & D focus, software support and research...
TG Daily: Of all of the products you have on your roadmap, is Terascale one that is receiving a significant amount of focus?
Rattner: I would say it’s actually getting an extraordinary amount of focus. There are dedicated organizations within several of our major business units. They get up in the morning and go and do [Terascale]. Still, the mainstream business [IA-core based microprocessor sales, chipsets, etc.] is so huge that all of these research projects pale by comparison. But I do think that, for Terascale, some very serious money is being spent. There are very serious development teams at work, and I’m speaking independent of the research--which continues to progress and will ultimately feed that product pipeline over time.
TG Daily: So, do you see a lot of the various disciplines, such as software groups here, software groups outside, research groups, hardware groups, products groups, sort of all coming together to feed into products like Larrabee and Terascale?
Rattner: Absolutely. In fact, we put the C++ transactional memory compiler up on the "What If" website and the response has been great; there were thousands of downloads. And there’s the Ct technology. That may not have quite the same release mechanism, but there’s tremendous interest in that tool as well. And in all of that, it’s research and development working very close with product [divisions] to put that offering together and engage the developer community long before the targeted silicon. You can actually run it on multi-cores today and it’s pretty useful. But the idea is that you’ll be able to take that same code and drop it on a Terascale class machine and it will just go.
TG Daily: How many research efforts are there working toward heterogeneous presentations on a Terascale core?
Rattner: Well, some of the research efforts we’re undertaking, they’re looking at a particular problem. They say, "We want to go look at this one thing". So, there’s that. Consider the follow-on to Polaris, the Terascale chip: While it does have many cores on it, the cores weren’t really the big thing, it was the interconnect. But we wanted to get past that, even for our own experimental reasons.
I think in most things we’re doing right now, we assume there is a degree of heterogeneity in the designs. And we try to do it in ways, which we think are reasonable from the developer’s point of view. We really have to be convinced that a new instruction set would really add significant value. So, when we look at heterogeneity, for the most part it tends to be in fairly fixed functions. Now that’s not to say they’re generic, like you push the bits through and they come out looking like something else.
Right now, we tend to focus more on these fixed functions that are just too expensive, either in time, energy or something else, to do on the programmable engine [i.e. generic CPU]. But we certainly look very hard at what it would take to do this on a standard IA core, compared to putting together a very specific piece of hardware to handle it. We always subject it to a test like, “Could we add an instruction or two, or do something with the register design, or tweak the caches in some way?" that would let us do this on the programmable side.
There’s a paper Ivan Sutherland wrote decades ago called "The Wheel of Reincarnation". He talks about the evolution of the graphics processor. As they started off with the graphics processor it had just a couple of instructions. And then they added a few more. And a few more. And then after a while they realized the GPU was as complex as the main processor, and then asked "Why don’t we just put a second main processor in it to do the graphics?" And you’ll see that if you go back a few years, like a decade ago, when there were all of these media processors out there and everybody had to have a media processor. But over several years, the general purpose CPUs reeled that back in and today, you don’t generally see media processors of that sort.
Handheld devices are certainly driving a re-examination of those tradeoffs. If you’ve got to play back music at 17 milliwatts, for example, well you just can’t do that by powering up a processor or, more importantly, powering up a DRAM, to let that processor work. So, in those highly energy constrained environments it makes sense. But I think for A/C powered systems you really want to move much more judiciously.
Something we’re taking a very hard look at is reconfigurability.
Read on the next page ... Reconfigurability.
TG Daily: What do you mean by "reconfigurability" ?
Rattner: Adding a certain amount of reconfigurable hardware, either with the processor or as sort of an adjunct. In that case, instead of a processor you get this chunk of reconfigurable hardware. If you’re dedicating 5% of your active die area to a special purpose function, you really need to have a big enough market for that functionality.
If you’re doing graphics, then you’re probably going to have things like texture processing in hardware. That’s going to be a dedicated function because it’s very clunky to do it in software. But then you step back and think about that and you say, "Wow, I have a fair amount of my real-estate tied up in this thing. Now, if I could reconfigure that, even at power-up, not even going into runtime switching, but if I could reconfigure those gates at power-up so that it is a texture engine in there, or not and it’s going to do something else, like computing the inner-loop of some force equation for molecular dynamics," then that part becomes a lot more interesting. And at that point you feel a lot better about your investment. It’s not like you’ve committed all of this area to this one thing.
There’s going to be a paper at the International Solid State Circuits Conference (ISSCC) next year outlining one of our first experimental efforts in this area.
Of course, this is not FPGA [Field Programmable Gate Arrays, a type of completely reconfigurable hardware often used in early prototype designs, and real-world “quick-and-dirty” applications]. FPGA is a little too fine grained and, as a result, it’s pretty area inefficient and it’s very power inefficient. So we’re looking more at bit-slice arrangements where you have things that look like registers [the fastest internal data components on a CPU] and ALUs [Arithmetic Logic Units, that carry out hard work inside the CPU] and barrel shifters [logic units which shuffle binary bits around] and other base abilities, and you can quickly reconfigure the data paths, and how you hook up those various units to, in some sense, build the degree of pipelining and parallelism that’s required.
TG Daily: Are you talking about at runtime, the actual chips are designed with these abilities? Or do you mean that while you’re designing them, well in advance, like during the manufacturing phase, they are then hard-configured in silicon?
Rattner: At runtime absolutely. Well, maybe at power-up.
TG Daily: So it would configure itself at startup, and then run that way until powered down?
TG Daily: Is this something that would be handled by additional BIOS settings? Such as "Make my CPU have a texture engine, or make my CPU be a math processor?"
Rattner: Exactly. And the issues you have when you do it at runtime are really complex. You have to consider, are the reconfigurable gates set the way each thread wants them, and...
TG Daily: ...you could hook it up to the task switcher?
Rattner: That’s exactly what you’d do. But then you’ve got this huge overhead for context switching, and there’s all this extra state data to save.
TG Daily: Are there real products coming with these reconfigurable abilities?
Rattner: It’s not committed for products yet, but it’s getting a lot of cycles on the research side.
TG Daily: You’ve introduced the ability to create all of this reconfigurable hardware at design time, built to specs. And once all of these components are created, they could be hooked up in whatever manner is required to perform some function to suit its target application. Couldn’t Intel very quickly produce all kinds of different Terascale processors which are explicitly configured to operate for their target, using that kind of reconfigurability at design time rather than power-up?
Rattner: Sure. You could certainly back the process up that way, taking it back to the design stages. But I think our view is to really have it softer than that. Maybe you do it at the time you ship, or you do it at boot time. Once you make those gates fungible, then it’s a matter of "Are you blowing fuses?" Or "Are you just loading up some flip-flops" to hold that state information?
Right now, we’re really trying to understand the tradeoffs. And particularly questions like, "What could you do in a square millimeter?" If you took a processor core and added a square millimeter to it, what sort of new capability would that give you? In fact, one of the tests I had the research teams go do was to go and look at the last two or three instruction set enhancements [SSE2, SSE3 and SSE4] and report back to me if we could’ve implemented them with reconfigurable hardware. Because if they could’ve done that...
TG Daily: Could they?
Rattner: Well it ended up being a self-fulfilling prophecy. Once they realized they couldn’t, they went back to work. [Laughing] But the idea is, "You should be sure you could’ve done the last two or three of those" because if you could do those with reconfigurable hardware then we could’ve brought those features to market years earlier. Maybe even a couple of tock cycles earlier. And then, if they were big hits, we could’ve turned them into hard gates and maybe gotten another 2x or 3x in performance out of them. And then, if it was not really all that exciting, and if it’s defined as part of the architecture and it’s going to be there until some far future time, then they would remain within the range of the reconfigurable hardware.
And, by the way, their reconfigurable performance is going to be getting better anyway. Moore’s Law will just make it better over time. So, for those few people who found that feature interesting, we’ll continue to configure the bits that way and support it. But for the ones that really hit, then we’d move them into hardware and they’d just become part of the machine. So it becomes a very low-cost way of introducing these extensions and seeing whether or not they warrant a full hardware implementation.
TG Daily: And this would all be determined at BIOS? With some setting the user would select?
Rattner: Yes. It would all be configurable at startup. It reads those settings, configures the gates and then off you go.
TG Daily: That's interesting. Has there been actual research in silicon on this? Or is it just emulations and simulations?
Rattner: A lot of that is going to be covered in the ISSCC paper. And, of course, reconfigurable hardware is decades old. It’s nothing new, so I’m not claiming we invented the idea. But that particular paper has been accepted by the governing body, so we’ll certainly be able to talk about it after the first of the year.
We’re also looking at reconfigurability in very specific settings on the communications side, like in digital radio. Part of the digital radio idea is a reprogrammable, or reconfigurable to be more technically correct, baseband processor. So there, in fact, the idea is to reconfigure it on the fly. In order to support multiple protocols, for example, the thing basically just changes in a matter of microseconds. Reconfigurability is something we’re really looking at and is increasingly interesting to us.
Read on the next page ... Other technologies, digital radio, catoms, situational aware software.
TG Daily: Let’s change gears a bit. What are the other technologies should we be aware of?
Rattner: Well, the digital radio piece is a big aspect of what we’re doing. And there’s also this new thing called DPR (Dynamic Physical Rendering) at the University of Pittsburgh. Catoms, or configurable atoms.
TG Daily: Catoms?
Rattner: They’re basically nanoparticles. I don’t know if you’ve ever read Michael Criton’s Book “Prey”, but it’s kind of like what’s in that book. These nanoparticles in the book reassemble themselves into people. So these guys at the Pittsburgh lab said, "Okay, what if you really wanted to build something like that?" And in the not too distant future we’ll be able to put some sort of electro-mechanical ability and some intelligence into these nanoscopic particles. Those particles would assemble themselves by some sort of electric or magnetic means, maybe mechanically, and then animate.
TG Daily: So this is all theoretical?
Rattner: Oh no, right now they actually have built some. You know, big ones.
TG Daily: Really? How big?
Rattner: Well, they’re macro sized. They’re about the size of a tennis ball or something. But they’re little catoms, and they actually do some of this functionality. They move around and move relative to one another and stuff like that. And we’re going to build smaller ones.
The real issue at this point is how is such a thing programmed? How do individual behaviors of those particles translate into some composite or collective behaviors across that ensemble of particles? And that’s what we’re working on.
In the near term, it’s really about understanding the theory behind these things. Is it really practical? And how do these individual behaviors produce interesting, important collective behaviors? Because you don’t want to have to be globally broadcasting the directions or instructions to each and every nanoparticle.
Another thing we’re very interested in, and it’s kind of an out-growth of the work we did a few years ago in sensors, is looking at context-aware computing, where the machine is aware of what’s going on.
TG Daily: How would that work?
Rattner: If you had a variety of sensors, or sensor-like inputs, so that you know where you are in space and time. Things like, you know whether the sun is shining, or whether you’re indoors, where you are, where you should be, and all that kind of stuff. Then you can begin to build usage models which emulate real-world activity or behaviors which are much more in tune with whatever the user happens to be doing.
"Are they at work?" "Are they at home?" "Are they at play?" All these kinds of concepts can be built into the model. "What are they doing?" "Did they just get off the airplane and they don’t know where the hotel is?" If we could design model like that, instead of doing a lot of very explicit work like going to your handheld, launching a browser, going to the web, searching for hotels, etc. If the sensors could provide the context for you, then as you get off the plane and pick up your handheld and it’s already showing nearby hotels, whether they’re in walking distance, maybe how best to get there, etc., all because it’s analyzed those models well enough to produce what’s required. Maybe you could even select one of the options and it starts to guide you there.
Read on the next page ... Software awareness of personal needs.
TG Daily: This seems to be a big goals for future computers, even the Terascale projects. So, are we talking about computers being better tools that know us? They're more aware of us and our personal needs?
Rattner: Exactly. In fact, I was just talking to one of the automobile manufactures and was telling him about this. He started laughing and said, "You’ve just described the in-car environment". And, of course, they’d even taken it to other levels, stuff like you can’t operate the digital controls while you’re driving. You’d have to be stopped on the side of the road, car in neutral, before it would engage. And that was for safety reasons. And it’s that kind of awareness, even obeying laws and stuff like that, which all stems from this model research.
They’re actually looking for ways to assess in real-time the driver’s state, or some kind of situational awareness within that vehicle. So, the vehicle can bring the user back to task. It might warn that the driver is about to run off the road, or run into this thing, or go through a red light, and provide usable assistance in that way.
And then if you think about the potential legal ramifications, like if the passenger is sitting in their seat with their feet up on the dash. What if the airbag deploys? Those things come out with force and there’s going to be serious injuries, certainly broken bones. So, the situational awareness can adapt to the occupant’s actions and activities.
They actually want to install cameras to look at who or what is in the passenger seat. Even to look at the driver’s face, read heart rate, sense other biometric data. It’s all about this kind of situational awareness.
TG Daily: And you have actual research projects looking at this? Or are these mental exercises, considerations for future projects?
Rattner: Oh no, this is definitely a major area of focus and research for us. The perceptual tasks, for example, have been a big part of our research for even 10+ years. I mean, the machine learning technologies have just advanced by leaps and bounds over the last ten years or so. And now, when you combine these perceptual tasks with Terascale, those ideas which just seemed completely out of reach a few years ago now look possible.
We have major activity in machine learning. And it’s really part of this larger effort which is computational perception. And, "How do we apply perceptual learning to all of these machine tasks?" And "How do you do it in a multi-dimensional fashion where the person’s emotional state is a consideration, the current environment, like in the vehicle, what’s going on there as well? Traffic, laws, whether or not there’s alcohol on the person’s breath."
And I think in the previous era, the rule-based IA era, that stuff just proved to be impossible because nobody could write enough rules. But, machine learning gives you the power to train by experience. Google had this recent success on language translation, and that’s a perfect example. And the amazing thing was it had no idea it was translating languages! It was just looking at the patterns.
So, that’s another area that’s of great interest. And not just the computational part of it, but when we look at it from the health perspective, in fact, sort of body area networks. ‘What kind of non-invasive sets of sensors that we could create?’ ‘What would be useful not just in judging a person’s emotional state, but also useful in monitoring their health?’
TG Daily: Thank you for the interview.
TG Daily attended an Intel sponsored event to facilitate this interview.