SDSC to make on-demand supercomputing available

Posted by Rick C. Hodgin

San Diego (CA) – The National Science Foundation (NSF) will soon begin allowing the San Diego Supercomputing Center (SDSC) to immediately allocate its resources in the event of an emergency. If, for example, an earthquake or other disaster takes place, these emergency jobs will be kicked off right away, providing researchers (and the public) with a fast, clear view of the event's impact.

The SDSC is part of a TeraGrid network, a system of nine supercomputing facilities across the the United States. It communicates via its own internal network at 30 Gbps. TG Daily recently toured the National Center for Supercomputing Applications (NCSA) in Urbana, Illinois, which is also part of TeraGrid. Follow this link to learn more about the way supercomputing centers operate.

Supercomputing facilities have a regular order of operations. Under normal circumstances the resources of any facility are planned out well in advance. A batch of jobs is maintained constantly at the ready, often consisting of more than 1,000 things in queue waiting to be processed. Centers like SDSC use some very sophisticated scheduling software to make sure their supercomputers are kept as busy as possible. The more machines they can keep busy, the more data they can process each year.

With that existing mindset, the ability to schedule immediate jobs is not something that's generally allowed. When we spoke with the team at NCSA they indicated there is occasional consideration given when priorities arise. Still, those priorities are generally things like a graduate student needing some data processed so there's time to complete a thesis. Those kinds of priorities are generally known weeks or months in advance and scheduled accordingly. What's being made available now is something that will address the needs of both research and the public.

The NSF is now allowing the SDSC to bypass their normal scheduling process. As soon as a safety concern arises, be it natural, biological or radiological, pre-determined jobs are immediately scheduled. This results in the center allocating for emergency resources ahead of otherwise scheduled batch jobs.

An example are earthquakes. The energy distributed in an earthquake, and the potential damage done, is often difficult to convey without animations. As a result of this new emergency ability, within about 30 minutes following a magnitude 3.5 or greater event, researchers (and TV viewers alike) can have animated clips showing in graphical form the full impact of the quake, including depths and energy. As data is recorded by the thousands of automated sensors, it automatically feeds into the computer model. In about half and hour, this tool can now alleviate fear and provide concise, visual answers about what happened.

The system can also be used to model weather patterns during extreme weather conditions. And its application to forecasting damage from biological or radiological events is also of extreme importance.

The OnDemand engine at SDSC is comprised of 256 2.33 GHz processors (64 Intel-based dual-socket, dual-core nodes). It operates at only 2.4 Tflops, but is sufficient for allocation for such jobs which routinely require about 150 processors for the full 28+ minutes.

OnDemand represents a real-life, practical use of a supercomputing facility that actually directly benefits the general public. While already serving to drive science and industry in many areas, this new allocation of existing resources could become part of our daily lives whenever the critical situations arise.