Data lakes or data warehouses? Alibaba puts forward a new take on big data architecture: integration of data lakes and data warehouses to provide the data lakehouse solution

Editor’s note

20 years of big data development

1.1 Overview

Figure 1 Daily volume of data processed by Alibaba during Double 11

1.2 A look at data lakes and data warehouses based on the development of big data

Figure 2. 20 years of big data development

What are data lakes?

Figure 3. Evolution of the architecture of data lakes
Figure 4. Architecture of Alibaba Cloud EMR
Figure 5. Data lake architecture (Source: Internet)

Birth of data warehousing and its relationship with data mid-ends

Figure 6. Architecture of the MaxCompute cloud data warehouse
Figure 7. Architecture of Alibaba Data Mid-end

Data lakes vs. data warehouses

Figure 8. Comparison of data lakes and data warehouses in technology stacks
Figure 9. Flexibility of data lakes vs. maturity of data warehouses

Next-generation big data platform: LakeHouse

1. Data warehouses support access from data lakes.

2. Data lakes support the capabilities of data warehouses.

  • In 2018, Nexflix open-sourced the internally enhanced version of its metadata service system, Iceberg, to provide enhanced data warehouse capabilities such as multi-version concurrency control (MVCC). However, the open source HMS has already become a de facto standard. The open source Iceberg is compatible, but only works with HMS as a plug-in. As a result, the data warehouse management capabilities are greatly undermined.
Figure 10. Hudi support matrix (Source: Internet)

Alibaba Cloud LakeHouse

6.1 Overall architecture

Figure 11 Overall architecture of Alibaba Cloud LakeHouse

6.2 Build a data mid-end that integrates data lakes and data warehouses

Figure 12. DataWorks data mid-end that integrates data lakes and data warehouses

6.3 Success story: Sina Weibo uses LakeHouse to build a hybrid cloud mid-end for AI computing

Figure 13. Business pain points for Sina Weibo
Figure 14. Sina Weibo architecture for integration of data lakes and data warehouses

Summary

--

--

--

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Tech

Alibaba Tech

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

More from Medium

Question, Preparation, Discovery and Action

Types of freight in Supply Chain Tech

Everything You Ever Wanted To Know About Microsoft Power BI | HData Systems