A Flink Series from the Alibaba Tech Team

Best practice on Apache Flink, the open source big data computing engine

Image for post
Image for post

Opening Remarks: We expect more developers will become part of the Flink community

Apache Flink, formerly known as Stratosphere, is a project initiated by several doctoral and postgraduate students at the Berlin Institute of Technology in Germany, where in 2014 they opened the source of the project and named it Flink. I became aware of Apache Flink in 2015 before witnessing and helping to complete its implementation as a stream computing engine at Alibaba Group. For many years now, it has helped Alibaba to pull off one successful 11.11 Shopping Festival after another. For the most recent 11.11 of 2018, the Flink engine smoothly supported real-time traffic peaking at 1.7 billion transactions per second.

Apache Flink has earned industry-wide recognition as the best available stream computing engine. However, Flink is in fact more than a stream processing engine. The positioning of Apache Flink is as a set of big data engines with multiple computing capabilities, including streaming, batch, and machine learning.

Lately, Flink has made great breakthroughs in many big data scenarios such as batch processing and machine learning. On the one hand, Flink’s batch computing has shown exponential improvement with Alibaba’s optimization work. On the other hand, the Flink community is gradually expanding its work into many areas including tableAPI, Python, and ML libraries, thus leading to a significant improvement on user experience for data science and AI computing. In addition, Flink is gradually upgrading its integration with other open source projects, including Hive, Notebook (Zeppelin, Jupyter), and so on.

Apache Flink has only been open sourced for four years, and we expect more companies and developers will join the community and ecosystem of Apache Flink and together build it as the world’s best open source big data computing engine.

A Brief History of Flink: Tracing the Big Data Engine’s Open-source Development

Open-source big data computing engine Apache Flink, or Flink for short, has gained popularity in recent years as a powerful framework for both batch processing and stream processing that can be used to create a number of event-based applications… [Read more]

In Search of Data Dominance: Spark Versus Flink

Image for post
Image for post

When it comes to big data, there’s no avoiding the importance of stream computing and the powerful analytics it enables in real time. It also goes that when it comes to stream computing, there’s no avoiding the field’s two most powerful data processing engines: Spark and Flink… [Read more]

Better to Give and to Receive: Alibaba’s Open-source Contributions to Flink

As an open-source framework for big data computing, Apache Flink has undergone extensive optimization to meet a range of users’ demands for enhancement. For Alibaba Group, where the framework is deployed in a large-scale production environment, the need for these changes has motivated its real-time computing team to contribute many of Flink’s most valuable optimizations, benefiting the Flink community and Alibaba alike… [Read more]

Flink or Flunk? Why Ele.me Is Developing a Taste for Apache Flink

Image for post
Image for post

Engineers at Alibaba’s food delivery app Ele.me(饿了吗) are finding themselves increasingly reliant on Apache Flink, an open source stream processing framework released in 2018.

What is so unique about Flink, and what sets it apart from Storm and Spark? This article investigates how Ele.me’s big data platform operates in terms of real-time computing and assesses Flink’s various strengths and weaknesses… [Read more]

From Code Quality to Integration: Optimizing Alibaba’s Blink Testing Framework

Image for post
Image for post

As demands on network equipment and infrastructures grow, there is an ever-increasing need for powerful solutions that can still operate when certain components are down. As one such solution, distributed computing frameworks effectively overcome potential network downtime by spreading loads over the entire network… [Read more]

Alibaba Blink: Real-Time Computing for Big-Time Gains

Image for post
Image for post

With the diverse ecosystem it presides over — now spanning e-commerce platforms Taobao and Tmall, advertisement platform Alimama, Ant Financial, Alipay, Alibaba CLoud, and Digital Entertainment to name but a few — the data on Alibaba’s servers measures into the exabytes and is growing daily by petabyte proportions… [Read more]

Alibaba Tech

Written by

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store