MIT breakthrough to speed up data centres

Pro

(Source: Stockfresh)

21 July 2014

A breakthrough by researchers at the Massachusetts Institute of Technology (MIT) could change the way web and mobile apps are written and help companies like Facebook keep the cat videos coming.

Their main innovation is a new way to decide when each packet can transit a data centre to its destination. The new MIT software called Fastpass uses parallel computing to make those decisions almost as soon as the packets arrive at each switch. The developers think Fastpass may show up in production data centres in about two years.

In today’s networks, packets can spend a lot of their time in big, memory-intensive queues. This is because switches mostly decide on their own when each packet can go on to its destination, and they do so with limited information. Fastpass gives that job to a central server, called an arbiter, that can look at a whole segment of the data centre and schedule packets in a more efficient way, according to Hari Balakrishnan, MIT’s Fujitsu Professor in Electrical Engineering and Computer Science. Balakrishnan co-wrote a paper that will be presented at an Association for Computing Machinery conference next month. The co-authors included Facebook researcher Hans Fugal.

Centralised decisions
Centralised decision-making is all the rage in networking, as vendors implement various versions of software-defined networking (SDN). In fact, Balakrishnan was one of the authors of a key early paper on SDN. But those systems make higher level decisions, such as how to handle various types of traffic, in seconds or minutes. Fastpass applies the same concept to packet-by-packet forwarding decisions, Balakrishnan said.

Their motivation is not so much to make your Facebook page load faster or your Google search results come up sooner, though that might happen. Instead, the inventors of Fastpass want to simplify both applications and switches, and shrink the amount of bandwidth companies require in their data centres.

In switches, tools for managing queues add complexity that raises costs, Balakrishnan said. He envisions future switches with room for very small queues, “just to be defensive,” and correspondingly lower cost and complexity.

By making all packets arrive on time, Fastpass can also save network architects from having to overprovision data centre links for unpredictable bursts of traffic. As the number of users and the volume of data grows, it should be easier to keep up.

There is a similar benefit for developers of distributed applications, which split problems up and send them to different servers around a network for answers.

Developer boon
“Developers struggle a lot with the variable latencies that current networks offer,” said co-author Jonathan Perry, an electrical engineering and computer science graduate student at MIT. With that solved, “It’s much easier to develop complex, distributed programs like the one Facebook implements,” he said.

The current decentralised way of forwarding packets allows for vast networks with little oversight. But because traffic is unpredictable, network designers have to either invest in fat enough pipes to carry the highest possible load or put a queue in each switch to hold packets until they can go out. Usually, it’s a balancing act between the two.

“It’s very hard to figure out how big the queues need to be. … This has been a difficult question since 1960,” Balakrishnan said. Making them too big can slow performance, while making them too small can lead to dropped packets and time-consuming retransmissions.

Fastpass assigns transmission times and selects paths for each packet, and it can do that quicker than a typical switch can, according to MIT. Fastpass is so much faster that even though it makes every packet go over the network to the arbiter, a trip that may take about 40 microseconds, it still speeds things up, according to MIT.

No queues
With that kind of speed, there is essentially no need for queues. In experiments in a Facebook data centre, Fastpass cut the average length of a queue by 99.6%, the researchers say. Latency, or the delay between requesting and receiving an item, went from 3.56 microseconds to 0.23 microseconds.

In the test, an arbiter with just eight cores was able to make decisions for a network carrying 2.2 terabits of data per second, which is equal to a 2,000-server data centre with gigabit-speed links running at full speed, MIT said. The arbiter was linked to a twin system for redundancy.

Instead of making all eight cores work on assigning a transmission for one time slot at a time, Balakrishnan’s team gave each core its own time slot. One tries to fill the next time slot while another is working two slots ahead, and another three slots ahead, and so on.

“You want to allocate for many time slots into the future, in parallel,” Balakrishnan said. Each core looks through the full list of transmission requests, assigns one, and modifies the list, and all the cores can work on the problem simultaneously.

Built in
Fastpass, or software like it, could be implemented in dedicated server clusters or even built into specialised chips, Balakrishnan said. The researchers plan to release the Fastpass software as open source, though they warned it’s not production-ready code.

“Anyone with a high-speed data centre should be interested,” he said.

Stephen Lawson, IDG News Service