The Achievement of a Computing Lifetime: Alibaba AI•OS 10 Years On

How a set of five basic components became the heart and backbone of Alibaba’s online services

Image for post
Image for post
Image for post
Image for post
Shen Jiaxiang, Alibaba’s Distinguished Engineer of Search Business Unit

Adapting Key Components: Tracing AI•OS’s Evolution

Over the course of ten years in development, AI•OS has undergone considerable changes to serve an increasingly diverse range of applications, ultimately becoming ubiquitous in a majority of Alibaba’s operations.

Image for post
Image for post
Overview of the AI•OS framework

Wading deeper: Hippo, AIOps, and end-to-end intelligence

Within the AI•OS system, Hippo is responsible for scheduling physical cluster resources. It is in this area that enabling platform containers and isolation technology meet search function engineering. Hippo is also a bridgehead where model training framework PAI-TF and real-time computing framework Blink can become systemically acquainted by way of aspect-oriented programming (AOP). Today, recommendation and search training tasks run on Hippo’s co-location resource pool. During the algorithm’s heyday, it ran on as many as 2,000 units of hundred-core machines running at full capacity, with a seven-day average of 1,300 units. These resources were obtained for free, and the value that these jobs created is beyond estimation.

Developing AI•OS through Productization

Among other important contributions, AI•OS’s deep-rooted productization allows the Alibaba group to be the backbone of its own enabling platform technologies. TPP, TisPlus, and OpenSearch, which are highly targeted enabling platform products for recommendation and search functions, make both the big data scenarios and basic search services at the core of many business units possible. In context of globalization, this means the AI•OS system does not require customized development for worldwide deployment. Thus, the enabling platform has a distinct technical advantage.

Image for post
Image for post

Complementary Frameworks

Rather than being a standalone system, AI•OS has a strong relationship with a number of complementary frameworks to deliver the functionality and continued development the Alibaba Group requires of it. Viewed individually, these connections can help to illustrate the trajectory of AI•OS and Alibaba’s operations in their present state, as well as pointing toward future developments.

AI•OS and algorithms

In responding to big data business challenges, AI•OS is able to play a role of at most 30% in any solution, with algorithms handling an additional 30% and products and opportunities accounting for the remainder. However, the 30% done by AI•OS is an essential prerequisite for those other solutions. Unfortunately, this has often gone overlooked, as happened in the early days of Taobao’s search function and more recently with Mobile Taobao product recommendations. Few technical fields present the kinds of circumstances that surround AI•OS and algorithm development, where the iterative efficiency of optimization algorithms determines the outcome of any scenario.

AI•OS and Blink

On its way to becoming a universal real-time computing engine, Blink underwent extensive incubation inside the early AI•OS framework. The relationship between these two technologies hinges on the concept of real-time computing, as engine services in the AI•OS system all require consistent data updates at intervals of several seconds, while Blink is ideally suited to the technical challenges AI•OS scenarios present. For these reasons, Blink developers place a high value on AOP, while AI•OS developers strongly advocate for Blink in co-location, implement it in Hippo, and merge it with Yarn and its pool. The complementary features in AI•OS and Blink are second only to AI•OS itself and key algorithms, in terms of importance to the Alibaba ecosystem.


At one time, PAI was intended to operate independently, which proved impossible due to its incompatibility with the rigid demands of the AI•OS system — especially those of Hippo’s co-location resource pool, despite its potential to play an important role between Blink and AI•OS. Fortunately, the three related development teams were able to reach a consensus on how work should be divided to that end. After forfeiting its own resource pool, PAI-TF successfully supported all model training tasks for search and recommendation algorithms, and also supported AI•OS’s graphical execution engine. In the future, PAI-TF will play a larger role in the core cue of AI•OS development.

Zooming out: AI•OS and Graph Computing

As a part of numerous theories applicable to offline scenarios like iterative computing, graph computing is emerging as a leading field in computing engine science. Whereas the pursuit of faster verification in the field of online services is a given, classic benchmark implementation in big data technology is much rarer. As to why this is happening, one possible reason is a lack of sufficient technical capabilities in the industry. The corresponding academic craze is more understandable, as graph theory is such a classic of computing that established experts are bound to be captivated by it, while the lack of benchmarks in the industry will also tend to stimulate fervor among researchers. Nevertheless, most big data business scenarios are not typical graph computing issues once completely abstracted. For example, abstracting AI•OS yields the rapid customization of a computation flowchart, which is at most a generalized graph computing model.

Image for post
Image for post

Alibaba Tech

First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store