This article is part of the Academic Alibaba series and is taken from the paper entitled “Elastic Sketch: Adaptive and Fast Network-wide Measurements” by Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao, Xiaoming Li, and Steve Uhlig, accepted by SIGCOMM 2018. The full paper can be read here.
Network measurements provide indispensable information at the best of times for network operations, quality of service, network billing and anomaly detection in data centers and backbone networks. But when a network is undergoing problems (congestion, scan attacks, or DDoS attacks), measurements are all the more important. During such times, traffic characteristics vary drastically, significantly degrading the performance of most measurement tasks. Now, this research team is proposing the ‘Elastic sketch’, which can adapt to changing network traffic conditions to keep measurement tasks going.
Keeping a Calm Head Under Fire
Recently, sketch-based solutions have been widely accepted in network measurements due to their high accuracy and speed. ‘Sketch’ simply refers to data streaming algorithms that can be used for network measurements.
Existing measurement solutions mainly focus on achieving a good trade-off among accuracy, speed and memory usage. Though valuable contributions, they do not address the problem of achieving accurate network measurements under variable traffic characteristics.
The first traffic characteristic is the available bandwidth. In data centers, congestion is common and can be as large as more than half of the network bandwidth. In this case, measurements are especially critical for congestion control and troubleshooting. One cannot wait for the available bandwidth to be sufficient to report the sketches, because network problems should be handled immediately.
The second is the packet rate, which is naturally variable and can range drastically. When the network is under attack (a network scan or a DDoS attack), the packet rate is very high and the processing speed of existing sketches on software platforms is fixed. Therefore, it does not work well when the packet rate suddenly becomes much higher, likely failing to record important information, such as the IP addresses of attackers.
The third is flow size distribution. It is known that most flows are small (mouse flows), while a very few flows are large (elephant flows). One solution is to accurately separate elephant flows from mouse flows, and use different data structures to store them. However, since the flow size distribution varies, a data structure which can dynamically allocate appropriate memory size for elephant flows would be beneficial.
Elastic sketch is composed of two parts: a heavy part and a light part. The team is proposing a separation technique named Ostracism to keep elephant flows in the heavy part, and mouse flows in the light part. To make it “elastic”, the team uses three methods.
1. To be adaptive to bandwidth, they have designed algorithms to compress and merge the sketches into an appropriate size to fit the current available bandwidth. They also use servers to merge sketches and reduce the bandwidth usage.
2. When the packet rate becomes high, they change the processing method: each packet only accesses the heavy part to record the information of elephant flows exclusively, discarding the information of mouse flows. This boosts processing speed at the cost of reasonable accuracy drop.
3. As the number of elephant flows varies and is unknown in advance, they have designed an algorithm to dynamically increase the memory size of the heavy part.
Addressing the Elephant in the Room
The team implemented their sketch on six different platforms to process six typical measurement tasks. Experimental results show that Elastic works well when the traffic characteristics vary, and outperforms peer approaches in terms of both speed and accuracy for each task.
The full paper and results can be read here.