Data lakes or data warehouses? Alibaba puts forward a new take on big data architecture: integration of data lakes and data warehouses to provide the data lakehouse solution

Editor’s note

20 years of big data development

Figure 1 Daily volume of data processed by Alibaba during Double 11
Figure 2. 20 years of big data development

What are data lakes?

Figure 3. Evolution of the architecture of data lakes
Figure 4. Architecture of Alibaba Cloud EMR
Figure 5. Data lake architecture (Source: Internet)

Birth of data warehousing and its relationship with data mid-ends

Figure 6. Architecture of the MaxCompute cloud data warehouse
Figure 7. Architecture of Alibaba Data Mid-end

Data lakes vs. data warehouses

Figure 8. Comparison of data lakes and data warehouses in technology stacks
Figure 9. Flexibility of data lakes vs. maturity of data warehouses

Next-generation big data platform: LakeHouse

Figure 10. Hudi support matrix (Source: Internet)

Alibaba Cloud LakeHouse

Figure 11 Overall architecture of Alibaba Cloud LakeHouse
Figure 12. DataWorks data mid-end that integrates data lakes and data warehouses
Figure 13. Business pain points for Sina Weibo
Figure 14. Sina Weibo architecture for integration of data lakes and data warehouses

Summary

--

--

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store