Rise of New Cloud Computing Interface Underpins “Olympics on the Cloud”

Introduction: Alibaba Cloud supported the global live broadcast of the Olympic Games this year for the first time, marking an important step in the Olympics going digital. Technology was of paramount importance for this year’s Olympics, which coincided with the global COVID-19. We believe that this historic event will open an era in which more sports enthusiasts choose to “watch games on the cloud” as the primary way of enjoying international sports events.

Author: Zhimin and Jiaxu

This year’s Olympics recorded several firsts, including the first to postpone its schedule and the first to limit the number of spectators, and is destined to leave a special mark in Olympic history. However, in addition to the many firsts, China’s technological power has also made a historic breakthrough on a key track of this global sports event.

Alibaba Cloud supported the global live broadcast of the Olympic Games this year for the first time, marking an important step in the Olympics going digital. Technology was of paramount importance for this year’s Olympics, which coincided with the global COVID-19. We believe that this historic event will open an era in which more sports enthusiasts choose to “watch games on the cloud” as the primary way of enjoying international sports events.

This was a true “Sports Event on the Cloud”. Alibaba Cloud not only provided a wealth of cloud computing resources including storage, computing, and networking to support core events in the games, but its container service also played a critical part, embodying a key trend in which the container is becoming a new interface for accessing resources on the cloud and the preferred method for global application deliveries. For example, the Alibaba Cloud Container Service for Kubernetes (ACK), the best container execution environment on Alibaba Cloud, and the Alibaba Cloud Container Registry (ACR), the best containerized application distribution infrastructure, were both driving international sports to evolve digitally by outputting cloud-native capabilities that feature efficiency and stability, extreme elasticity, security and intelligence.

Echoing the progress and transcendence embodied in the Olympic motto of “Faster, Higher, Stronger — Together”, ACK strives to deliver the very best capabilities. During this year’s widely viewed Olympics, Alibaba Cloud’s enhanced container service ACK Pro and container image service for enterprises ACR EE delivered impressive performance, while ensuring a robust base for creating and running upper-level applications, and demonstrating to the world China’s “cloud-native power”.

Each and every move of this Olympic Games fell under the spotlight due to the particularity of the timing and the huge challenges faced by the organizers. The official website of the games is the most authoritative platform for publishing real-time event information. Thanks to the high-availability, dual-active architecture of ACK Pro in Frankfurt, Hong Kong and other regions, the website delivered high performance and provided stable, reliable, and safe access from across the world throughout the period. Alibaba Cloud’s container technology has played a key part in guaranteeing timely updates schedules, events, athletes, and Olympic stories to the world.

Given the huge scale of the Olympic Games, it is no exaggeration to describe its data needs as “massive”. A huge database is the only solution to process such information efficiently. The database received information from event result reporting applications, collected information such as the event start time and the athlete’s performance for centralized processing, and then sent data to other applications.

The event database used ACK Pro to create a high-availability architecture supporting remote disaster recovery in multiple regions including Tokyo and Frankfurt, helping to ensure data security, business continuity, and comprehensive data protection for applications. In addition, the system has stringent real-time requirements to ensure real-time data collection, processing, and output. The excellent performance of ACK Pro and ACR EE fully met these real-timeliness requirements. Alibaba Cloud allowed nodes to expand rapidly and pods to respond to traffic bursts, thanks to ACR EE’s large-scale distribution of container images and ACK Pro’s extreme elasticity.

In addition, the rapid DevOps deployment capabilities of the container technology was leveraged in automatic media labeling to integrate data from different sources, such as athletes’ entry time and scoring time. This allowed databases to be established and the metadata related to OBS video images to be enhanced via AI. This project benefitted from ACK Pro during deployment and establishment to improve the automation of media labeling.

Technology has enhanced the public interaction with the games via a diversity of novel and interesting online approaches, despite the strict limitation imposed on watching the games on site. For example, an Olympic-themed adventure mobile game launched by PinQuest allowed users to embark on their own “Olympic Village Adventure” on mobile phones. The game is supported by Alibaba Cloud Serverless Kubernetes (ASK), which brings extreme elasticity into key modules. It was initiated and quickly released more than 10 days before the event, fully demonstrating the rapid deployment and extreme elasticity of containers.

Rome was not built in a day. What enabled the extensive application and satisfying performance of container services at this year’s Olympics Games was Alibaba’s core technical capabilities, accumulated and honed during the cloud-native transformation for more than 10 years.

Core Technical Capabilities of ACK

ACK provides the most competitive container service in the industry and has been China’s largest player in terms of market share for many years. In addition to large-scale sports events such as the Olympics, ACK has also become the backbone of large-scale business or entertainment events such as the Double 11 shopping carnival, the 618 shopping carnival, and the Spring Festival Gala. It supports the group’s core e-commerce businesses, retail cloud Jushita, logistics cloud Cainiao CPaaS (Communications Platform as a Service), Middleware MSE, edge clouds CDN and ENS, as well as allowing AI, databases and DingTalk audio and video modules to go cloud-native. These practices have helped ACK accumulate core technical competitiveness in diverse environments.

Figure 1: Overall structure of Alibaba Cloud’s container services

Alibaba Cloud’s container services are now available in 24 regions around the world, covering China, Asia Pacific, North America, and Europe and enabling global deployment, built-in optimal and high-availability practices, and disaster recovery and backup solutions. This makes them highly suitable for global business scenarios seeking to improve system availability and stability. For the Olympic Games with its very stringent data reliability and SLA demands, the customer deployed multiple trans-continental container clusters based on ACK Pro and ACR EE covering the Frankfurt, Hong Kong, and Tokyo regions, and recorded zero failures and satisfactory stability throughout the process.

ACK is one of the world’s first service platforms to pass Kubernetes conformance certification. It provides high-performance containerized application management services and supports lifecycle management of enterprise-level Kubernetes containerized applications. As a leading domestic cloud computing container platform, ACK has been growing together with its customers in all kinds of industries since its debut in 2015.

ACK has upgraded its technical capabilities over the past year, including a 30% increase in its high-performance cloud-native container network Terway over the community network, high-performance storage CSI’s support of efficient volume management of large-scale X-Dragon hosts in the database, and ASK’s extreme elasticity upgrade. In terms of large-scale scheduling, ACK has enabled efficient and stable management over tens of thousands of container clusters, the largest container cluster group in China, being the first domestic manufacturer to pass the large-scale certification (10,000 nodes, 1 million pods) by the China Academy of Information and Communications Technology (CAICT).

A professional managed cluster (ACK Pro) is a cluster type evolved from a standard managed cluster (ACK) and inherits all the advantages of the original managed cluster, such as master node management and high availability of master node. Compared with the original managed cluster, ACK Pro enhances the reliability, security, and scheduling features of the cluster and supports an SLA that contains a compensation standard, making it suitable for enterprise customers who run large-scale business in the production environment and have high requirements for stability and security.

· More reliable management of the master node: stable support to the management of large-scale clusters; etcd disaster recovery and backup, and hot and cold data backup mechanism to maximize the availability of the cluster database; and observability of key indicators of the managed components to help you better predict risks.

· A more secure container cluster: encrypted disk used by default for storage on the management plane etcd; kms-plugin component installed on the data plane to encrypt and store secret data on disks; open-type security management and an advanced version of security management with stronger detection and automatic repair capabilities for running containers.

· Smarter container scheduling: ACK integrates the kube-scheduler with enhanced scheduling performance and supports multiple intelligent scheduling algorithms to support NPU scheduling and optimize container scheduling capabilities in business scenarios such as massive data computing and high-performance data processing.

· SLA assurance: ACK supports SLA with a compensation term, with the availability of its cluster’s API server reaching 99.95%.

Alibaba Cloud Container Registry (ACR) is a securely managed and efficient distribution platform for OCI-compliant cloud-native products such as container images and Helm charts. ACR EE supports acceleration throughout the chain such as global synchronization acceleration, large-scale and large-image distribution acceleration, and multi-code-source construction acceleration, and is seamlessly integrated with ACK to help enterprises cut delivery complexity and create an all-in-one solution for cloud-native applications.

  1. Management of a diversity of OCI-compliant products to support multi-architecture container images (such as container images of Linux, Windows, and ARM), Helm Chart v2 and v3, and management of OCI-compliant products.
  2. Multi-dimensional security mechanisms to ensure encrypted storage of cloud native products, image security scans and multi-dimensional vulnerability reports to protect storage and content security; separated access control for container images and Helm chart networks and fine-grained action audits to ensure product access security.
  3. Application distribution acceleration to support inter-region synchronization around the world and enable more efficient container image distribution; P2P distribution acceleration to ensure fast deployment and scalability of services.
  4. More efficient deliveries of cloud-native applications. ACK provides a cloud-native application delivery chain that supports observability, traceability, and independent configuration throughout the chain. It also supports policy-based automatic cut-off to update applications at one go, automatically deliver applications for multiple global scenarios, and improve the efficiency and security of cloud-native application deliveries.

ACK is the largest container service in China in terms of user size, supporting tens of thousands of Kubernetes clusters, so efficient and stable management of the massive clusters is essential. ACK uses the following methods to build a stability assurance system.

· Integrated operation and maintenance

The ACK uniform operation and maintenance platform integrates cluster monitoring, alarming, logging, inspection, metadata management, asset management and other functions across the network and supports real-time observation and management of all clusters in the 24 regions connected to the network. For example, if an exception occurs in a master component or a system component of a user’s Kubernetes cluster or the cluster suffers an abnormal event, the exceptions or abnormalities can be observed on the operation and maintenance platform and an alarm will be automatically triggered. The efficient management platform enables ACK to manage tens of thousands of clusters in the network with higher stability.

· All-scenario diagnosis

ACK provides a Center for Internet Security (CIS) reinforcement service to allow users to perform in-depth inspection and diagnosis on key elements of cluster operations including the network, nodes, components, and services in coverage. It ensures professional inspection and diagnosis and a user-friendly experience to enhance users’ cluster management capabilities. Users can inspect running clusters and services and generate inspection reports. With ACK, users are not just deploying and using Kubernetes strengths, but more importantly, they become empowered by Kubernetes professional capabilities and benefit from its deep-embedded capabilities.

· Well-structured support pre-plan system

ACK developed a full-process support solution targeting the Olympic Games based on its existing support solutions, including pre-plans, contingency plans, fault drills, and duty scheduling. ACK has rich experience in providing support and its capabilities are constantly honed during annual events like the Double 11 shopping carnival, the 618 shopping carnival, and the Spring Festival Gala. These large-scale events entail complex and comprehensive tasks and ACK has recorded nearly zero failures supporting them.

Apart from the above major events, ACK also organizes regular chaos-based fault drills and surprise attacks internally. In such drills, faults are injected to the chaotic system randomly and the ACK team members on duty receive an alarm and handle the issue immediately according to plans in the system. Such regular training has tempered the team’s emergency responses and allowed them to meet the 1–5–10 objective (namely, issue the alarm within 1 minute, locate the fault within 5 minutes, and resolve the fault within 10 minutes). These support systems, which have been repeatedly honed through real-life combat, were applied to the Olympics support program thereby guaranteeing stable and smooth progress of the games.

Trends in Containerization and Global Application Delivery

ACK was deeply engaged in the support to this year’s Olympic Games, and delivered steady performance of core tasks associated with the official website and event data processing with its industry-leading cloud-native technology, products, and services. In collaboration with Alibaba Cloud’s other services, ACK contributed to the success of the “Olympics on the Cloud”.

ACK will also play a supporting role in the upcoming Paralympic Games and Winter Olympics. Alibaba Cloud has been building efficient, secure, intelligent, and boundless container technical capabilities and rock-solid service quality, allowing science and technology and the Olympic rings to enhance each other, and to help more sectors and enterprises around the world speed up their digital transformation.

Alibaba Tech

First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!