Standing Up to the Surge: Alibaba’s X-Engine Weathers a Storm of Transaction Data

Image for post
Image for post

This article is part of the Academic Alibaba series and is taken from the SIGMOD 2019 paper entitled “X-Engine: An Optimized Storage Engine for Large-scale E-commerce Transaction Processing” by Gui Huang, Xuntao Cheng, Jianying Wang, Yujie Wang, Dengcheng He, Tieying Zhang, Feifei Li, Sheng Wang, Wei Cao, and Qiang Li. The full paper can be read here.

It is often said that e-commerce has achieved things unimaginable just a few years ago. Still, it is interesting to try to imagine what it would look like if a brick-and-mortar business were to experience what e-commerce giant Alibaba contends with at least once a year — say, for instance, the 122-fold spike in transactions in span of a single second that launched last year’s 11.11 Global Shopping Festival. Whereas for a shopping mall this kind of surge would mean a literal stampede of shoppers, though, the above deluge happened mostly out of view in the depths of Alibaba’s data centers, challenging its systems to once again outdo their yearly best at processing transactions.

Image for post
Image for post
The rate for transactions-per-second shoots to 122 at the stroke of midnight as the 2018 11.11 Global Shopping Festival begins

As the largest e-commerce platform in the world, Alibaba often needs to innovate solutions to challenges solely its own in the computing world. Now, to better contend with mass forces from its 600 million-strong shopper base, Alibaba researchers have introduced a write-optimized storage engine known as X-Engine that manages swings in data records’ “temperatures” as they move through a shifting terrain of promotions and flash sales, among other issues. Based on an LSM-tree tiered storage architecture, X-Engine works as a core component of Alibaba’s POLARDB distributed database designed to serve online transaction processing.

When Oceans of Data Surge

Alibaba’s transaction database challenges are well summarized in a trio of water-themed metaphors.

The first, known as the tsunami problem, concerns the inherent risk of publicizing a major promotional event. No matter how much the marketplace prepares, the full scale of consumer activity can only be known when the anticipated day arrives, often shocking expectations. The second, the flood discharge problem, concerns how Alibaba’s storage engine works to quickly convey data from the main memory to durable storage while processing highly concurrent transactions, much as drainage systems struggle to during heavy rainfall. The last, the fast-moving current problem, is somewhat more complex; at its core is the problem of “hot” and “cold” records in relatively instantaneous or latent demand, respectively, with any such record threatening to change temperature and impact systemic data-flow currents as conditions in the marketplace change.

To address the tsunami problem, X-Engine improves the single-machine capacity of storage engines so that the number of instances needed to handle a spike in transactions is vastly reduced. To handle the flood discharge problem, it makes use of the tiered storage offered by the LSM-tree structure and a number of optimized compaction algorithms, moving records among different tiers that correspond to different “temperatures” of access frequency; notably this approach tries to employ a number of different forms of memory that are ideal for specific tiers, including new technologies like NVM storage. Finally, to improve conditions under the fast-moving current problem, X-Engine helps to ensure that emerging hot records that are located at deep (i.e. cooler) regions of the database can be moved out as quickly as possible for effective caching.

Image for post
Image for post
Overview of X-Engine’s technical architecture

Facing the Tide: Experimental Results

X-Engine faced off against popular storage engines InnoDB and RocksDB in tests designed to simulate real-world e-commerce workloads using benchmark datasets. Most notably, X-Engine achieved remarkably stable QPS (queries per second) performance in the crucial test case designed to emulate the 11.11 Global Shopping Festival workload, outperforming InnoDB and Rocks DB by 44% and 31%, respectively. It also greatly improved on the competitors’ throughput in tests aimed to reflect the tsunami problem and their speed in tests for the flood discharge problem.

In future work, X-Engine’s developers will adopt a shared storage design to improve its scalability and apply machine learning methods to predict data temperatures.

Image for post
Image for post

The full paper can be read here.

Alibaba Tech

First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.

Written by

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store