AMD’s Markham labs: Flushing out graphics bugs

Posted by Wolfgang Gruener

TG Daily On The Road – Last week had an opportunity to visit AMD’s former ATI headquarter outside Toronto to get some information on new products we aren’t allowed to talk about just yet. What we can talk about, however, are ATIs AMDs validation labs. We did a similar report on this a while ago for Intel, and even if this is just the place where AMD develops chipsets and graphics cards, it shows just how different the two companies are in size.

 

 



When you buy a computer you can be certain that its hardware has a history of stress testing, which is usually referred to as validation. The purpose of validation is to unveil potential bugs in the design of products and prep, in this case, semiconductors for mass-production. The goal, of course, is that you get a functional chip for the PC in your office and your home.  

Every CPU has bugs, some of the serious, some of them not. They are corrected throughout a chip’s lifecycle and are documented in so-called errata. Validation processes have become more visible since the Intel’s 1994 Pentium FDIV bug, probably the best known and most serious bug caught in processors in modern times. More recently, you may have heard about a bug in AMD’s Barcelona processor that has been making headlines and cost AMD billions of dollars in market capitalization. As the cost of the development and production of a processor goes up, validation becomes a more and more important link the development chain.

Granted, the Markham labs represent only a fraction of AMD’s global validation team. Still, we believe that a look behind its doors provides interesting insight in the firm’s effort to test and fix chips that are still in development and yet to be released to the market. There are several stages in this validation chain and we’ll describe a few of the most interesting ones (well, and the ones that AMD was willing to show us) in this article.

Prototyping

Intel and AMD are different companies with different resources. While Intel employs more than 3000 people around the globe in validation jobs, AMD has about 400. 30 of those are working in a “prototyping” hall, which, well, has the purpose of providing sample numbers of motherboards with chipsets and graphics cards. This stage in the developing is, by the way, also vastly different from Nvidia’s fab-less approach, which in fact does not produce physical prototypes. Instead, Nvidia simulates the hardware design in a little-known, but massive supercomputer installation located at its Santa Clara headquarters. Nvidia’s first silicon of a new chip is the actual chip that goes into the market.

However, prototyping is critical to AMD’s graphics unit and unfortunately we weren’t allowed to take any pictures of this stage. We estimate the size of the hall at about 50 x 25 yards, slightly smaller than one quarter of a football field. There are three production lines with the capability of manufacturing three different boards at the same time. There are is not one of those super-clean, dust free environments you see in pictures, but leaves a surprisingly plain impression. But then AMD does not manufacturer actually silicon here, but simply integrates components, such as GPUs (which are manufactured in Taiwan), and connectors onto boards. Only about half of the production stations are automated and we saw about ten stations in each line that required ATI engineers to manually work on a board.

Production runs in two shifts or 16 hours per day, employing about 100 people. We were told that, several years ago, this facility actually served as an-onsite production line for ATI graphics cards. Today, the installation serves for prototyping only, meaning that most of the products created remain in house for testing and others being shipped to partners, journalists and analysts for evaluation purposes. The production capacity is about 4000 boards per month.

It isn’t hard to guess that a graphics board takes significant time from entering the production line to leaving it. AMD staff told us that it takes “less than a day” to complete a board, which shows just how complex, time-consuming and expensive the prototyping process is. We were told that the actual production cost of a prototype high-end graphics card (which sells for about $400 when in mass production) is somewhere between $2000 and $3000.

Typically, several hundred prototypes are produced, but actually never reach the market. Most of the cards remain in-house and go into validation and stress-testing. A portion of the batch is sent to partners, journalists and analysts for review.

 

Read on the next page: Validation and driver testing

 


 

 

 

 

 

Validation

The validation lab is where new chips are put through a variety of stress tests to mature the design for mass-production. These tests can take much more than a year, but recently have been made more efficient: Intel, for example completes validation in typically less than 9 months. ATI graphics chips, we learned, spend about 8 weeks in validation. Staff told us that the Markham lab primarily deals with GPUs, chipsets as well as combinations of chipsets, GPUs and CPUs.

 

 

The Markham lab has about 350 boards in validation at any given time. During our visit last week, the lab was primarily busy running the RS780 Northbridge, the SB700 Southbridge as well as the upcoming Puma mobile platform carrying the Griffin processor.  

On an annual basis, the lab sees between 5000 and 6000 different products, all of them on more or less custom-built boards. The lab itself is less high-tech than what you would expect (at least when you look at the obvious), but is full of workbenches that are used to assemble test setups. For example we saw soldering stations on which boards are made one by one with a per-unit cost of about $500 - $600. There are simple plastic cases that simulate the (temperature) environment of a desktop PC. For more serious temperature tests, AMD uses refrigerator-sized temperature chambers to expose products to a temperature range between 0 and 70 degrees Celsius.

 

 

When errors in chip designs are found and detailed, AMD uses a third party in Canada to conduct the “microsurgery” on its silicon. Intel, for example, has installations to repair or modify silicon at all its major development locations on-site.

AMD’s Markham labs extend over several different rooms with different test purposes. Unfortunately, we were not allowed to take pictures in all of them. One of those more secret ones was particularly interesting, as it housed equipment to put HT3, HT1, RAM and a competitor platform we were asked not to mention (gee, what could that be?) under full load. Each of those test installations easily fits on a regular food serving cart, but is outrageously expensive. Each of those test units costs about $1 million, we were told.

 

 

Overall, validation is a pricey process. Engineers mentioned that the Markham validation lab alone has spent $998,000 on Puma testing. Oh, and were told that Puma has passed the validation process and “is seconds away from shipping.”

Software lab

A big part of graphics is the driver and we had a chance to also look into AMD’s graphics driver lab. It is separated into different stages, ranging from development to quality assurance and houses hundreds of mid-tower PCs (we counted about 400) as well as a rack-server installation.

AMD has “nightly” Catalyst driver builds, which are tested on each of the PCs. The company tests drivers driver on each of its product reaching back to the Radeon 9700 as well as all Windows operating systems these cards were running on and certain Linux variants (Suse, Ubuntu). The hardware platforms are based on Intel (since lots of these PCs are dating back to times in which ATI had a good relationship with Intel), AMD, SiS and Via. The oldest system currently in use is a Pentium 4 HT PC.

 

 

The quality assurance trims the complexity of the test and focuses “on the bleeding edge” of graphics cards, AMD said.

The obvious question for the staff was whether Windows 7 is already part of the driver testing process, especially in the light of recently shipped Windows 7 M1 versions that apparently support heterogeneous graphics environments (i.e. multiple graphics cards from different vendors within the same system). And quite apparently, it is a delicate topic to talk about and AMD staff politely declined to comment about Windows 7.

Judging by the engineer’s smile we received when asked this question, tells us that Windows 7 already has entered the evaluation process at AMD. We just can’t tell you how well the software works with AMD graphics at this time.

But if you want to get a first idea what Windows 7 M1 is about and looks like you can read our first look article here