San Diego, CA - A diagnostic tool for data center networks is claimed to be able to detect delays short as tens of millionths of seconds.
The new approach - called the Lossy Difference Aggregator - can diagnose delays down to tens of microseconds and packet loss as infrequent as one in a million at every router within a data center network. The solution could be implemented in today’s router designs with almost zero cost in terms of router hardware, say the computer scientists who developed it, and with no performance penalty.
“This is stuff the big traders will be interested in,” said George Varghese, a computer science professor at the UC San Diego Jacobs School of Engineering, “but more importantly, the router vendors for whom such trading markets are an important vertical.”
“Our hope is that this approach will allow router vendors to add fine scale delay and loss tracking, at almost zero cost to router performance, perhaps obviating the desire for expensive external network monitoring boxes at every router,” said Ramana Kompella, a computer science professor at Purdue University. “The next step would be to build the hardware implementation; we are looking into that.”
The usual way to measure latency is to track when a packet arrives at and leaves a router, take the difference, and average over all packets that arrive over a fixed time period. However, a typical router may process 50 million packets in a second, making keeping track of each packet’s arrival and departure pretty daunting.
It's no use summing all the arrival times in one counter, and all the departure times in another, subtracting the two counters and dividing by the number of packets. If a packet is lost within a router - which often happens - the lost packet arrival time is included but its departure time is not, throwing the whole estimate wildly out of whack.
The new system randomly splits incoming packets into groups and then adds up arrival and departure times of each of the groups separately. As long as the number of losses is smaller than the number of groups, at least one group will give a good estimate.
With this invention built into every router, a data center manager should be able to quickly pinpoint the offending router and interface.
The network manager can then upgrade the router or link, or reassign an offending application that is sending message bursts to another processing path.
“If implemented, this kind of approach should enable investment bankers to turn their attention to tuning their algorithmic trading programs to make more intelligent investments, instead of worrying about delays through obscure routers,” said Varghese.
The full paper is available here.