A Flink Series from the Alibaba Tech Team

Best practice on Apache Flink, the open source big data computing engine

4 min readApr 10, 2019

Opening Remarks: We expect more developers will become part of the Flink community

By Wang Shaoxuan, Senior Staff Engineer from Alibaba’s Computing Platform

Apache Flink, formerly known as Stratosphere, is a project initiated by several doctoral and postgraduate students at the Berlin Institute of Technology in Germany, where in 2014 they opened the source of the project and named it Flink. I became aware of Apache Flink in 2015 before witnessing and helping to complete its implementation as a stream computing engine at Alibaba Group. For many years now, it has helped Alibaba to pull off one successful 11.11 Shopping Festival after another. For the most recent 11.11 of 2018, the Flink engine smoothly supported real-time traffic peaking at 1.7 billion transactions per second.

Apache Flink has earned industry-wide recognition as the best available stream computing engine. However, Flink is in fact more than a stream processing engine. The positioning of Apache Flink is as a set of big data engines with multiple computing capabilities, including streaming, batch, and machine learning.

Lately, Flink has made great breakthroughs in many big data scenarios such as batch processing and machine learning. On the one hand, Flink’s batch computing has shown exponential improvement with Alibaba’s optimization work. On the other hand, the Flink community is gradually expanding its work into many areas including tableAPI, Python, and ML libraries, thus leading to a significant improvement on user experience for data science and AI computing. In addition, Flink is gradually upgrading its integration with other open source projects, including Hive, Notebook (Zeppelin, Jupyter), and so on.

Apache Flink has only been open sourced for four years, and we expect more companies and developers will join the community and ecosystem of Apache Flink and together build it as the world’s best open source big data computing engine.

A Brief History of Flink: Tracing the Big Data Engine’s Open-source Development

From version 1.1.0 to 1.6.0, Apache Flink’s relentless improvement exemplifies open-source development.

Open-source big data computing engine Apache Flink, or Flink for short, has gained popularity in recent years as a powerful framework for both batch processing and stream processing that can be used to create a number of event-based applications… [Read more]

In Search of Data Dominance: Spark Versus Flink

Meet the powerhouse data processing engines dueling to define the next era in big data

When it comes to big data, there’s no avoiding the importance of stream computing and the powerful analytics it enables in real time. It also goes that when it comes to stream computing, there’s no avoiding the field’s two most powerful data processing engines: Spark and Flink… [Read more]

Better to Give and to Receive: Alibaba’s Open-source Contributions to Flink

Between its SQL and Runtime layers, Alibaba has helped optimize Apache Flink for large-scale production environments like its own

As an open-source framework for big data computing, Apache Flink has undergone extensive optimization to meet a range of users’ demands for enhancement. For Alibaba Group, where the framework is deployed in a large-scale production environment, the need for these changes has motivated its real-time computing team to contribute many of Flink’s most valuable optimizations, benefiting the Flink community and Alibaba alike… [Read more]

Flink or Flunk? Why Ele.me Is Developing a Taste for Apache Flink

What is so unique about Flink, and what sets it apart from Storm and Spark?

Engineers at Alibaba’s food delivery app Ele.me(饿了吗) are finding themselves increasingly reliant on Apache Flink, an open source stream processing framework released in 2018.

What is so unique about Flink, and what sets it apart from Storm and Spark? This article investigates how Ele.me’s big data platform operates in terms of real-time computing and assesses Flink’s various strengths and weaknesses… [Read more]

From Code Quality to Integration: Optimizing Alibaba’s Blink Testing Framework

Alibaba completes testing and optimization for a revolutionary distributed open-source computing framework in just one year.

As demands on network equipment and infrastructures grow, there is an ever-increasing need for powerful solutions that can still operate when certain components are down. As one such solution, distributed computing frameworks effectively overcome potential network downtime by spreading loads over the entire network… [Read more]

Alibaba Blink: Real-Time Computing for Big-Time Gains

How a new real-time computing framework made the 2017 Double 11 Festival the largest real-time computing phenomenon in history

With the diverse ecosystem it presides over — now spanning e-commerce platforms Taobao and Tmall, advertisement platform Alimama, Ant Financial, Alipay, Alibaba CLoud, and Digital Entertainment to name but a few — the data on Alibaba’s servers measures into the exabytes and is growing daily by petabyte proportions… [Read more]

Alibaba Tech

First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.