How China’s Biggest Logistics Platform Handles Task Dispatch

With millions of packages to be delivered each day, Cainiao needs a powerful engine to streamline operations

In China, more than 100 million packages are delivered every day. Cainiao, a logistics platform affiliated to Alibaba Group, handles a significant portion of those packages. Cainiao faces daunting challenges in ensuring timely deliveries and keeping customers satisfied when things fall behind schedule.

In response to this challenge, the Alibaba tech team developed a lightweight scheduled task dispatch engine, capable of dealing with the sheer scale of Cainiao’s delivery market. This system works through the real-time monitoring of each link in the logistics chain, and by removing dependence on fixed hard disk storage.

What is the standard solution to task dispatch?

This solution is easy to implement and works well for scenarios where a stand-alone machine or a small amount of traffic is involved. However, attempting to use this approach in a distributed and traffic-intensive scenario creates a lot of complexity and challenges:

1. It is necessary to design a load balancing strategy for the cluster task schedule.

2. Even if requirement 1 is met, the timing of scheduled tasks in some scenarios may be concentrated at a certain point in time, resulting in excessive pressure on a single node of the cluster.

3. A reasonable estimation of capacity is required to prevent subsequent linear storage extension from becoming too complicated.

The main restriction of the above solution is the coupling of task timing and task storage. Decoupling time from storage renders task storage effectively stateless, greatly increasing the range of storage options available and significantly reducing storage constraints.

How can the time wheel decouple task timing from the task itself?

The below figure illustrates how the time wheel algorithm works.

Time wheel

How can time wheel scheduled storage tasks help the solution?

In response to this problem, the Alibaba tech team created a new time wheel and linked list-based solution that can do the following:

· Support discrete scheduled messages

· Address the complexity of managing individual task lists for each clock hand position on the time wheel

The following figure illustrates the solution’s key points.

Alibaba MQ scheduled message solution

The central idea behind this solution is to write the task list to the disk and use the linked list on the disk to chain the task list. To achieve this chain effect, each task needs a file offset which points to the next task. In this way the entire task linked list can be obtained if the header of the linked list is obtained.

With this solution, the entire task list does not need to be stored for each clock hand position. For each clock hand position, only the header of the linked list needs to be stored. This reduces demand on memory.

This solution is acceptable for a system with positioning similar to middleware, but for a system that is positioned in ordinary applications, the solution’s deployment requirements are too high because a fixed hard disk is needed.

With the current containerization and dynamic capacity management trends, ordinary applications need to rely on fixed hard disks, which adds complexity to system operation and maintenance. The lightweight scheduled task dispatch engine can help address this issue.

What makes the lightweight scheduled task dispatch engine so useful?

The lightweight scheduled task dispatch engine also removes the system’s dependence on disk storage. Since the engine doesn’t use disk storage, it must use specialized storage services (such as Hbase, Redis, and so on). The underlying storage is therefore replaced with a storage structure that is compatible with storage services.

Adjusted time wheel and linked list solution

The task is designed as a structured table, and for each task the offset is replaced with a task ID (the above figure “Alibaba MQ scheduled message solution” shows the system that uses the file offset). The ID is used to chain each task on the linked list. Only the linked list header ID is associated with the time wheel. The purpose of the solution implemented here is simply replacing the file offset with an ID, so as to remove disk dependence.

These stages of the solution address some issues, but there are still some problems to be considered.

Problem 1: Parallel extraction is not possible for the single linked list, and this affects extraction efficiency. When many tasks are scheduled at a certain time, the scheduled task processing delay is severe.

Problem 2: The task is not stored on the local disk, the entire cluster’s scheduled tasks are in concentrated storage, and each node in the cluster has its own time wheel.

How can task linked list partitioning accelerate extraction of single linked lists?

Linked lists are efficient from a writing perspective, but less efficient from a reading perspective. If you need to mount a long task linked list at a certain point in time, you cannot use concurrent methods to improve reading efficiency.

How does partitioning improve task queue extraction efficiency for a certain point in time?

The following figure shows the solution after adjusting in accordance with partitioning principles.

Design after partitioning

How can data be recovered after cluster deployment and node restart?

1. How can “time wheel metadata” be recovered when it is not written to disk?

Scheduled task storage problems can be solved using the adjusted time wheel and linked list and design after partitioning solutions detailed above.

Specialized storage services can also be used to address the metadata storage problem. However this does not solve the problem fully and leads to the next challenge.

2. How can the cluster node obtain its own metadata?

Metadata can sometimes be associated with the node IP or MAC address. Under current dynamic capacity scheduling conditions, though, it is impractical for a machine to have a fixed IP, as it cannot be guaranteed which application the machine will run on. Since the physical environment cannot be relied on, only the logical environment can be relied on to distinguish the time wheel of each node.

Therefore, each node in the cluster is assigned a unique logical ID, so that each machine obtains time wheel data through their own ID.

3. How are logical IDs assigned to nodes?

The following figure illustrates node registration and ID allocation.

Node registration and ID allocation process

Note that the master node itself is also registered and assigned an ID. The reason for this is that there is no separation between the master node and non-master nodes in ordinary applications, and the master node also participates in the entire computing operation. The master node, though, is also additionally responsible for cluster management.

The following figure illustrates single-node startup.

Single-node startup process

How can cluster management be automated?

However, in dynamic scheduling scenarios, even the application owner is not aware of cluster extension and reduction operations.

The cluster can perceive that it has been extended or reduced. This is because the master node can perceive the survival status of other nodes in the cluster. The master node adjusts cluster load according to changes in the cluster capacity.

How can scheduled task extraction clustering relieve cluster pressure?

A set of cluster management solutions has also been explained. If the concurrency is concentrated in a single node, and the node can only perform extraction individually, then when the number of expired tasks on the current node becomes too large at a given moment, pressure on certain nodes in the cluster becomes very high.

Computing cluster capabilities can be optimized with the cluster management capabilities described in this article: Collections of expired task linked lists can be distributed to other cluster nodes through soft load balancing Since task data is stored centrally, all tasks from a linked list can be extracted if other nodes can obtain the ID of the task linked list header, thus balancing cluster pressure. The following figure illustrates the process.

Cluster parallel task extraction


(Original article by He Bibo何碧波)

Alibaba Tech

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store