People@TGDaily

10 things you didn't know about...
Read more at
   SmallNetBuilder.com
Nvidia unveils Tesla, moves into supercomputing PDF Print E-mail
Hardware
By Wolfgang Gruener   
Wednesday, June 20, 2007 14:20
Article Index
Nvidia unveils Tesla, moves into supercomputing
Page 2

Santa Clara (CA) – Nvidia today announced Tesla, a third product line next to the GeForce and Quadro graphics products. The company aims to use Tesla cards and the massive floating point horsepower of its graphics processors to take over a portion of the lucrative supercomputing market.

 


Nvidia Tesla in detail: 20 images ...

 


The core of each Tesla device is a GeForce 8-series GPU as well as the general component layout of the high-end Quadro FX 5600 workstation graphics card with 1.5 GB of memory (in Tesla, it has 1.35 GB). The only noteworthy difference between the FX 5600 and a Tesla card is the fact that the supercomputing-targeted devices lack the graphics outputs on the backpanel, which we were told, allows Nvidia to increase the clock speed on Tesla.

While the actual clock speed of the Tesla GeForce GPU is kept under wraps, Nvidia said that one processor (used in the C870 add-in card) is good for a performance of 518 GFlops, two processors (used in the deskside supercomputer D870, which integrates two C870 cards) will bring 1 TFlops; the Tesla GPU server with four processors will hit 2 TFlops.

In terms of pure number crunching horsepower, Nvidia told us that one GeForce GPU can match the combined performance of 40 x86 processors. In addition to the raw performance, Tesla also makes a case for power efficiency: The C870 is rated at a maximum power consumption of 170 watts and the GPU server at 800 watts, which may sound a lot at first look. However, 40 low-power x86 processors would run at a typical 1600 watts. With a common power budget of about 25 kilowatts per rackserver, a Tesla GPU server rack has a theoretical maximum performance of more than 60 TFlops – which would put the floating point rating of such a device among the 15 fastest supercomputers currently ranked on the Top 500 Supercomputer list.


Similarities to ATI’s stream processor card, implications for developers

Readers, who have been following recent general purpose GPU announcements, will remember that ATI has product in its portfolio that is very similar to the Tesla C870 – the stream processor card (which is based on a R580 GPU and 1 GB of memory). Both products follow the same concept to make the massively processing capability provided by shader processors available to run arbitrary code instead of graphics code.

Developers such as John Stone and James Philips, senior research programmers at the Beckman Institute of Advanced Science and Technology at the University of Illinois, have been looking at accelerators such as GPUs for some, but have been limited mainly by bugs in shader drivers. Stone told us that much of his work with GPUs in the past was focused “on finding driver bugs” and “writing his applications around them” in order to make the technology usable for scientific simulations. “There can be a lot of rounding errors and because of this very fact, I wasn’t very excited about working with GPUs,” he said.

However, both AMD and Nvidia came up with a programming model to solve this problem. On AMD’s side, it is called CTM (“close to metal”) and on Nvidia’s side it is CUDA (“Compute Unified Device Architecture”). At this time, it appears to come down to personal liking which model is preferred by a developer, as, for example, there are some universities that are working with CTM (such as Stanford’s Folding@Home project) and there are some that are working with CUDA. Stone and Philips are focusing on the Nvidia model as they claim its C++-based language model is easier to deal with than AMD’s CTM version, which uses a low-level assembly language.

While CUDA works very much like a regular programming model and, according to Stone, can deliver results very quickly, the big challenge in exploiting these devices will be knowledge to write advanced parallelized code for these GPGPUs. Stone believes that especially coders who have written code for (massively parallel) supercomputers before will have an easy transition opportunity. Of course, knowledge of the hardware, graphics processing and a good look at the parallelizable parts of applications help to take advantage of the technology.

Shane Ryoo, a graduate research assistant at the University of Illinois at Urbana-Champaign, said that CUDA will allow programmers with some experience in developing threaded applications to get “really good results right off the bat.” However, it will be the fine-tuning process, which will increase the value of GPGPUs: Ryoo noted that expert knowledge that will allow developers to squeeze the best possible performance out of GPUs, sometimes can accelerate application code by a factor of 5x or greater.

Nvidia is well aware of this challenge and has begun assisting universities in establishing classes and developing course material focusing on massively parallel programming and CUDA in particular. Eventually, the company hopes, that GPGPU programming will become a standard part in computer science course work and help to educate a whole new generation of programmers. So far, Nvidia has taught courses at the University of Illinois, The University of California, the University of North Carolina and Purdue University. Nvidia said that several universities are developing their own courses, including the University of Virginia, the University of Pennsylvania, Oregon State University, the University of Wisconsin. Caltech, MIT, Berkeley and Stanford have been offering “legacy” GPGPU and GPU programming classes, according to Nvidia chief scientist David Kirk.

Read on the next page: The payoff, cost and the impact on the consumer

 

 



 
-view -hardware -135 --135