How Do We Set Up Data Middle Office Based on DataWorks?

24 min readMay 7, 2021

By Huanbo

1. New Retail business model

Many people think that setting up Data Middle Office is an unachievable quest, but the truth is it can be easy as long as we fully understand the nature of the business. This is because Data Middle Office is flexible and can vary based on each retailer’s business operations. In this article, we will talk about how to set up New Retail Data Middle Office for a retailer.

The retailer may dabble in a variety of business operations, such as online e-commerce platforms, brick-and-mortar stores, official apps, distribution channels, and supply chains. Before we collect data from these business operations with the intention of setting up Data Middle Office, we must understand the business model of the retailer. Based on the business model, we can define the business operations that the retailer needs to run. Then, we can plan Data Middle Office for the retailer.

For example, the retailer used to run their business mostly in brick-and-mortar stores but now also on an online app. However, the online stock of the retailer is not synchronously updated with the offline stock. In addition, the commodities that are available for sale on the online app differ from those in the brick-and-mortar stores. In this sense, the retailer is still leveraging the traditional retail business model while trying to get their feet in the door with e-commerce to explore new possibilities.

To set up Data Middle Office, we must first break away from the traditional business model and design a new one that truly blends online and offline business operations. Setting up Data Middle Office is often considered to be a crucial project for a retailer.

After a new business model is formulated, we must define the specific business operations that the retailer needs to run. For example, the retailer needs to deliver fresh food to consumers’ doorsteps within XX minutes, direct foot traffic to the online app, restructure the brick-and-mortar stores into a warehouse that is connected to the online app, make sure that the same commodities are sold at the same prices on the online app and in the brick-and-mortar stores, and allow consumers to order the essentials they need online and then pick up their items in store.

Once we have identified these business operations as part of the retailer’s new business model, we can then talk about how Data Middle Office makes these business operations easier, and how we can leverage data to implement the new business model throughout a closed loop.

2. Technical architecture for New Retail business operations

Figure1. Technical architecture for New Retail business operations

After we have defined the specific business operations that the retailer needs to run, we must design a technical architecture for Data Middle Office. A number of software vendors offer mature software systems, such as enterprise resource planning (ERP) and warehouse management system (WMS), that are suitable for business operations such as stores and supermarkets. Due to the abundance of options, the retailer may find it difficult to decide whether they should buy off-the-shelf software systems or build their own software systems from scratch.

Today, some mature ERP or logistics management software systems are also digitized to adapt to the times. However, these software systems cannot outperform Data Middle Office.

Data Middle Office not only digitizes and structures data but also intelligently makes recommendations on traffic, logistics performance, process optimization, and financial policies. For example, the retailer has 100 fish in stock and divides the stock into two parts: 10 fish for sale on the online app and 90 fish for sale in the brick-and-mortar stores. After the 10 fish allocated to the online app is sold out, no more fish is available for sale on the online app even if there is still plenty of fish left in stock in the brick-and-mortar stores. With Data Middle Office, the retailer can integrate their online and offline data and commodities. As such, the 100 fish in stock can be available for sale both on the online app and in the brick-and-mortar stores at the same time based on the “first come, first served” principle, until all the fish is sold out. In addition, the retailer can use algorithms to streamline out-of-stock alerts, discounting, cross-selling, and supply chain adjustments. Data Middle Office also empowers the retailer to restructure business operations. These characteristics distinguish Data Middle Office from the simple digitization that is applied to mature software systems.

If the retailer has the technical capabilities, we recommend that they develop all their crucial business systems on their own. During the development process, the retailer can digitize traditional business operations, including transactions, brick-and-mortar stores, warehousing, transportation and distribution, procurement, supply chain, and labor force management. This way, the retailer can benefit from a closed-loop system that is engineered based on the new business model to encompass all these business operations. The closed-loop system is shared among the retailer’s online apps, e-commerce platforms, and brick-and-mortar stores. This lowers the data barriers that stand in the way of setting up Data Middle Office.

In the preceding figure, the data layer is an integral element of the closed-loop system. In addition to the design of a business system, the unified construction of Data Middle Office is also important for the retailer to implement enterprise engineering. That is also what we will focus on today.

Figure 2. New Retail Data Middle Office team

Data Middle Office stands not only as a solution but also as the responsibility of a team. We recommend that the retailer dispatch an independent Data Middle Office team dedicated to managing data assets. The data assets are as valuable as the commodities, membership, and equipment of the retailer. The members of the team are builders, managers, and operators of the data assets. They will make the best of the data assets to boost the transition towards a smart supply chain and to exploit more values from the data assets by efficiently collecting, managing, and structuring the data to support the retailer’s business operations.

Figure 3. General architecture of New Retail Data Middle Office

The preceding figure shows the general architecture of New Retail Data Middle Office. The retailer can modify this general architecture based on their varying business requirements.

This general architecture is built mostly through the combined capabilities of Alibaba Cloud DataWorks and Alibaba Cloud MaxCompute, which have been bolstering Alibaba Group in setting up Data Middle Office over 11 years.

In this general architecture, the source data layer simply obtains data from the business system, and the data access layer operates on a more complicated level because it needs to support access to a large amount of structured and unstructured data from all sources, including apps, brick-and-mortar stores, couriers, electric motorcycles, Internet of Things (IoT) devices, and employees. Then, the data proceeds to the data manipulation layer, which processes unstructured data into structured data. All of that structured data is aggregated to comprise the data assets layer.

The data assets layer provides data that can be directly used to run business operations. However, we want to build a data service layer above the data assets layer to make data more comprehensible. In addition, we want the activities at the data service layer to be imperceptible, and the retailer does not expect the users of their Data Middle Office architecture to go through a large number of tables for needed data. To achieve these goals, we also suggest a data application layer, which consists of applications that are provided as products to grant the users convenient access to data. These applications can run on various devices to support data pass-through. These devices include personal computers, DingTalk, Handy Baby, and small IoT devices like small-scale black-and-white screens.

In addition, a complete management system is provided to help the retailer efficiently run business operations and conduct operations and maintenance (O&M) practices.

This general architecture is designed to be hierarchical and business-oriented.

Figure 4. Technical architecture of New Retail Data Middle Office

Based on the hierarchical, business-oriented general architecture of Data Middle Office, we can go further and design a more specific technical architecture for Data Middle Office based on the retailer’s business requirements.

Data Middle Office must support the computing of large amounts of batch data and streaming data at the same time. For batch data, we recommend Alibaba Cloud MaxCompute, which has been tested overs years of practice in processing nearly all batch data for Alibaba Group. During Double 11 in 2020, MaxCompute showed an outstanding capability, processing up to 1.7 EB of data per day. For streaming data, we recommend Alibaba Cloud Realtime Compute for Apache Flink, which can process up to 4 billion messages per second.

In addition to data computing, Data Middle Office is also responsible for data storage. For example, the data that is processed by Realtime Compute for Apache Flink can be stored to Alibaba Cloud Hologres, a real-time interactive analytics service. Hologres supports a data write speed of up to 596 million records per second. It also supports online queries from Alibaba Cloud Elasticsearch and can respond to queries for petabytes of data at sub-second speeds. The stored data is pooled in Hologres to create a real-time data warehouse. In addition, the stored data can be utilized to provide a wide range of data services, such as the metric service, detail service, characteristic service, and tag service. These data services are available on operations platforms, DingTalk Suits, smart management devices, and other devices that are most commonly used for the business operations of the retailer.

Data Middle Office also incorporates a data mart O&M management layer, which provides capabilities such as metadata management, data quality control, disaster recovery management, and data governance.

This technical architecture is more like a manifesto of the retailer’s technical requirements that the technical team must consider when they set up Data Middle Office.

3. DataWorks-based New Retail Data Middle Office

After we have finalized the business model, the technical architecture for business operations, and the technical requirements for Data Middle Office, we can proceed to technology selection and technology surveying to find out what products and systems are required to fulfill the technical architecture of New Retail Data Middle Office.

As mentioned above, we recommend the retailer develop their own business system. However, the retailer does not need to develop their own technologies pertaining to Data Middle Office.

Alibaba Cloud offers a complete suite of well-developed products and services that are required to set up Data Middle Office. For example, the retailer can select MaxCompute as the batch data warehousing solution, the combo of Realtime Compute for Apache Flink, MaxCompute, and Hologres as the streaming data warehousing solution, and DataWorks as the tool to develop and govern data.

DataWorks serves almost all business units of Alibaba Group. Every day, tens of thousands of employees, including operators, product managers, data engineers, algorithm engineers, and R&D engineers, are using DataWorks, which also serves a large number of third-party users on Alibaba Cloud. The following figure shows the architecture of DataWorks.

Data integration is the first step to set up Data Middle Office. DataWorks provides a bunch of data integration capabilities, such as batch synchronization, incremental synchronization, real-time synchronization, and database migration. These capabilities empower the retailer to integrate various complex data sources in a secure, stable, flexible, and fast manner by using the Internet, a data center, or a virtual private cloud (VPC). For example, DataWorks supports the offline synchronization of more than 50 data sources and the real-time synchronization of more than 10 data sources.

In addition, DataWorks provides a unified metadata management service, a unified task scheduling service, and an affluence of all-in-one data development tools. These services and tools remarkably increase data development efficiency throughout the lifecycle of data development.

Above the data integration, metadata center, task scheduling, and data development layers lie more essential layers, including the data governance, data service, and open interface layers. The open interface layer of DataWorks provides more than 100 APIs. As most retailers use self-developed or purchased business systems, they can modify a number of features based on their business requirements and integrate their business systems with various other self-developed systems and project systems. For example, these retailers can modify their business systems to allow alerts to be pushed to their own monitoring and alerting systems.

Figure 6. Technical architecture of DataWorks-based New Retail Data Middle Office

When we compare the earlier discussed technical architecture of New Retail Data Middle Office with the technical architecture of DataWorks-based New Retail Data Middle Office, we find that the data collection process in the former serves as a counterpart of the data integration process in the latter.

Basically, DataWorks can meet all the data synchronization requirements that are proposed in the technical architecture of New Retail Data Middle Office. But it goes farther. DataWorks can perform offline, online, and real-time data developments at the same time by using its DataStudio, HoloStudio, and StreamStudio modules. It provides data services and open interfaces to support integration with existing systems and products. In addition, it provides data map and data governance capabilities, which may seem unimportant but actually play vital roles in Data Middle Office.

In the preceding sections, we have covered the preparations that the retailer must make before they start to set up Data Middle Office. These preparations include understanding the nature of the business, designing a satisfactory Data Middle Office architecture, and selecting the required technologies.

Next, we need to determine the goal that the retailer wants to attain by using Data Middle Office. The goal is not a simple KPI, but more of a mission or an intention. To be specific, the retailer aspires to build an intermediate layer that features rich data (full-link and multi-dimensional), reliable quality (standard and accurate), and stable running (timely and fault-free) and provides reliable data services, data products, and business applications for the upper layers.

Many people may say that we are seemingly building a data warehouse or a data mart, but we are not. If the retailer simply processes data and stores the data results to MaxCompute, open source Hadoop, or a database, Data Middle Office indeed functions only as a data warehouse or a data mart. However, Data Middle Office does more than that. It empowers the retailer to produce data that can be directly used to run business operations and can even yield business values. That is what distinguishes Data Middle Office from a data warehouse and a data mart.

After a goal is determined, we then start to cover each request step by step. When the business team proposes a request to the data development team, they may state only an expected sales amount, but not specific limits. For example, the business team does not specify during which period and in which region they expect the sales amount and whether they count refunds in.

With that in mind, firstly, we need to design an efficient metric system, which is developed as a product of Data Middle Office. Secondly, we design a functional data model that is used to normalize data, because fields in tables cannot be directly used for business operations. Thirdly, we need to develop data processing tasks based on the data model. Finally, we open the data results to business operations by using data services, which can be tables, APIs, reports, and even products.

Figure 8. Data mart architecture — general hierarchy

Figure 9. Data mart architecture — functionality positioning

The preceding figure shows a general hierarchy that is used to build a data model or a data mart. This general hierarchy includes a source data layer (ODS), a detail data layer (DWD), a summary data layer (DWS), and a data application layer (ADS). Despite a set of popular concepts and ideas, each individual may have a unique understanding of the layers that constitute this general hierarchy. Therefore, we must clearly define the characteristics and responsibilities of each layer:

ADS must be oriented to business instead of to development. It provides data that can be processed within the shortest period of time or even be directly used for business operations.
DWS must aggregate data into metrics that are required by ADS.
DWD must provide detail data that is required by DWS. We recommend that the dimensional modeling method be used to build this layer. The retailer needs to manage both dimension tables and fact tables. Each dimension table contains a number of dimensions such as the enumeration dimension, and the fact tables can be periodic snapshot fact tables. The fields that are defined in DWD must be unambiguous and comprehensible. Otherwise, DWS cannot properly process the data that is produced by DWD, and this causes problems in all product applications at the lower layer.
ODS must synchronize data from the business system without making any additional processing. Some teams tend to configure ODS to perform extract, transform, load (ETL) on the data. However, the ETL processing causes inconsistencies between the data in ODS and the data in the business system. This is the last thing we want to see. We need to ensure that the data in ODS is exactly the same as the data in the business system. In this case, any issue that occurs can only be caused by the middleware or storage instead of by the business logic. This simplifies the troubleshooting of any issue later on.

4. Set up Data Middle Office based on DataWorks

Figure 10. Data development platforms of DataWorks

Previously, I shared some thoughts on the setup of Data Middle Office, including the design, architecture, goal, and related requirements. In this part, I will talk about how to set up Data Middle Office based on DataWorks and share my experience in using DataWorks. DataWorks has served not only customers of Alibaba Cloud, but also almost all business units of Alibaba Group since 2009. It is designed to be open, versatile, and flexible. When the data development personnel of retailers use DataWorks, a series of problems may arise due to excessive flexibility or lack of common practices. The content in this part aims to help them prevent these problems.

Figure 11. Data development — Synchronize data

· It is recommended that all data in the business library be synchronized to the hm_ods project for centralized storage and management.

· Only one copy of data can be synchronized. The purpose is to save storage space.

· Permanent storage must be configured to support data backtracking and auditing.

The first step of the Data Middle Office setup process is to synchronize data to databases. In this step, we must ensure that the following requirements are met:

1. All business data of the retailer is synchronized to one project.

2. Only one copy of data is synchronized. To facilitate data management, reduce costs, and avoid data ambiguity, we must not repeatedly synchronize data.

3. The data source is valid. If any data from the data source is invalid, all the related data processing results are inaccurate.

4. Synchronized data is permanently stored to support data backtracking and auditing. In some online databases, some data may be archived or deleted due to traffic restrictions. Permanent storage ensures that the original data can be restored from the ODS layer when the business system needs to use historical data.

Figure 12. Data development — Develop code to process data

· Data processing is a process of implementing business logic.

· We must ensure the correctness of business logic, and the stability and timeliness of data output.

The second step is to develop data. This step requires high coding and development proficiency. Most developers use SQL. According to my experience, data development is a process of implementing business logic. We must ensure the correctness of business logic and the stability, timeliness, and rationality of data output. The data development editor of DataWorks is a powerful coding tool. It supports visualized data processing and can be used to review code and verify part of the code. These features are very helpful in daily data development.

Figure 13. Data development — Code samples

To ensure data consistency and simplify data use for downstream applications, I recommend that the business logic be encapsulated at the DWD layer.
If any changes occur at the data source, we can convert code or data formats to keep the data structure of the DWD layer stable and prevent excessive changes to the downstream applications.
To develop a good data model, we must collaborate with the business team in design and development. The business system must be appropriately designed, and any changes in the business system must be perceived in a timely manner.

As all programmers know, code is developed based on programming paradigms. From my experience in Java development, I abstracted the following steps for the data development process:

Step 1: code conversion and data format conversion. As mentioned earlier, many business systems have unique data processing mechanisms. For Internet business, some ambiguous content, such as JSON fields, media fields, and separators, is used to solve performance problems or filter data. Code conversion is used to eliminate the ambiguity of the content. For example, we can convert enumerated values, such as 0, 2, and a, to strings that are easy to understand.

Different business systems may use different data types. For example, time information can be presented in various timestamp or string formats. When we build a data mart, we must convert data formats based on related standards to make sure that the data formats are consistent.

Step 2: business judgement. In this step, conditions are used to make judgements and obtain business results. For example, a business system has age data but definitely does not have a field or business logic called “young people”. In this case, we can use the condition “people younger than 30 years old are young people” to determine whether people are young.

Step 3: data connection. In this step, a table is associated to supplement data in the current table.

Step 4: data aggregation. A large amount of data needs to be aggregated at the DWS layer.

Step 5: data filtering. Invalid data is filtered out in this step.

Step 6: conditional selection. In this step, conditions such as WHERE clauses are used to select data. To some extent, conditional selection is similar to data filtering.

Step 7: business analysis. Retailers need to frequently analyze business data. Some business teams use NoSQL, MySQL, or MongoDB databases. In these database management systems, a major field may contain various business representations. According to my experience over recent years, when we construct the DWD layer of a data mart, we must parse all JSON or MAP fields into column fields of fixed formats. As mentioned earlier, data formats must be consistent and data must be directly visible to users.

To ensure data consistency and simplify data use for downstream applications, we should try to encapsulate the business logic at the DWD layer. If any changes occur at the data source, we can convert code or data formats to keep the data structure of the DWD layer stable and prevent excessive changes to the downstream applications. To develop a good model, we must collaborate with the business team in design and development. The business system must be appropriately designed and be capable of perceiving changes in a timely manner. The setup of Data Middle Office is impossible without collaboration with the business team.

Figure 14. Data development — Configure task scheduling policies

In the previously described data development process, DataWorks functions as an IDE. However, DataWorks is far more than an IDE. As an E2E big data development and governance platform, DataWorks can also run data development code for retailers. This is achieved based on the task scheduling feature of DataWorks. New retail is very complex. For example, fresh food may need to be delivered within 30 minutes, common e-commerce products may need to be delivered within two or three days, and many pre-sales and pre-order activities are launched. Simple scheduling systems may not be able to meet the scheduling requirements. DataWorks provides multiple task scheduling cycles, such as month, week, and day, for users to select. It is capable of scheduling 15 million tasks per day during Double 11. It outperforms other scheduling systems in terms of scheduling flexibility and stability.

In the closed-loop business system of a retailer, all business stages are co-related. This means that data tasks also have correlations with each other. Therefore, the entire task scheduling link is very complex.

My team made a lot of attempts and innovations in task scheduling and also encountered failures. Here I will make a summary. Task scheduling failures and task scheduling at incorrect points in time may cause data losses or errors. We must deal with all problems of each online task at the earliest opportunity because each of these problems may further cause a data problem. Appropriate task scheduling policies can ensure the correctness and timeliness of data output. We must set scheduling cycles based on our business requirements. For example, if daily output is required, the scheduling cycle must be set to one day.

Figure 15. Data O&M and governance — Monitor data quality

The purpose of data quality monitoring is to ensure the correctness of data output.
The monitoring objects include changes in table sizes, changes in the number of rows, changes in the enumerated values of a field (such as a new business type “take-out”), primary key conflicts (such as the same SKU in two rows), and invalid formats (such as invalid email formats).
Abnormal values trigger alerts or interrupt data processing for personnel on duty to process exceptions as soon as possible.

You may think that, for data developers, task scheduling is the last step of a project or demand. However, it is not like this in most cases. Data Middle Office involves many commercial elements. Any problem on it may result in significant impacts. A corporate has core systems at the group, department, and business line levels, and also non-core systems. Different core systems have different quality assurance requirements, and faults are assigned priorities such as P1, P2, P3, and P4 based on their severity. Data business systems require similar quality assurance.

Unlike common business systems, Data Middle Office relies on DataWorks to ensure the stability of the entire online big data business. DataWorks provides a data quality monitoring module, which plays an essential role for data teams to detect problems in a timely manner. When business is affected, an error is immediately reported. (However, sometimes a delay may occur. In this case, the data team do not know the occurrence of an error until business personnel contact them for troubleshooting.) The purpose of data quality monitoring is to ensure the correctness of data output. The monitoring objects include not only changes in table sizes, functions, and enumerated values of fields, but also primary key conflicts and invalid formats. Abnormal values trigger alerts or interrupt data processing for personnel on duty to process exceptions as soon as possible.

Figure 16. Data O&M and governance — Manage business baselines

Baselines are used to ensure the timeliness of data output.
Priorities of tasks determine the assurance strength on hardware resources, as well as the assurance workloads of O&M personnel on duty.
We must manage all important tasks based on baselines and assign the highest priority level 8 to core tasks.

The large monitoring scale described in the preceding part leads to a large number of alerts. DataWorks provides a baseline management feature that can be used to manage the alerts. Like common business tasks, data business tasks also need to be tiered based on importance. We can use baselines to classify the tasks. According to my experience, baselines can ensure the timeliness of data output. Priorities of tasks determine the assurance strength on hardware resources, as well as the assurance workloads of O&M personnel on duty. To ensure the timeliness of data output, we must assign the highest priority level 8 to the most important tasks.

DataWorks provides a data reprocessing tool. When a baseline has an error or data output is not generated as expected, we can use the tool to reprocess data.

We can use the intelligent monitoring module of DataWorks to determine whether there is a risk that data output cannot be generated as expected. The intelligent monitoring module uses algorithms to make predictions based on the task status and the running periods of historical tasks. For example, it is expected that a task generates data output at 18:00 and another task generates data output at 24:00. If the intelligent monitoring module finds that the first task fails to generate data output before 19:00, it makes a prediction on the timeliness of the other task. If the prediction result shows that the data output cannot be generated at 24:00, the intelligent monitoring module immediately reports an alert. This way, O&M personnel on duty can take measures at their earliest opportunity to prevent a real delay from occurring. Intelligent monitoring and risk prediction are very useful for business stability.

Figure 17. Data O&M and governance — Govern data assets

The main purpose of data governance is to optimize storage and computing, reduce costs, and increase resource utilization.
Each technical team is responsible for a number of projects. Data governance requires collaboration with technical teams.
Various approaches can be used to govern data assets. For example, we can take unneeded applications offline, manage lifecycles of tables, and eliminate repeated computing and violent scanning.

Like data quality monitoring and baseline management, data governance also plays an important role in the stable and normal running of big data tasks and business. Alibaba Group is a data-driven company. One of its biggest milestones in digital transformation is that the hardware investments for data storage and computing exceed the hardware investments for business systems. This explains why the CTO of Alibaba Group always regards data governance as one of the top priorities.

DataWorks is the big data platform that involves the largest amount of data and the largest number of users in Alibaba Group. DataWorks provides a data asset module called UDAP, which can be used to view the overall resource utilization of a specific project, table, or user from multiple dimensions. UDAP also scores users based on their resource utilization. We can view the health score ranking of users in each business department. We can handle the lowest health scores first to increase the average score. UDAP also provides some data visualization tools that can be used to view the data governance effect. We must take note of the following points in data governance:

First of all, the main purpose of data governance is to optimize storage and computing, reduce costs, and increase resource utilization.

Second, each technical team is responsible for a number of projects. To govern data assets, we must collaborate with technical teams.

Third, we can use various approaches, such as to take useless applications offline, manage the lifecycles of tables, and eliminate repeated computing and violent scanning. The violent scanning of computing resources must be prohibited.

Some features of UDAP, such as the governance of duplicate tables and repeated data development and integration tasks, can also be implemented in the resource optimization module of DataWorks.

Figure 18. Data O&M and governance — Manage data security

Data security is implemented at the platform (MaxCompute), project, table, and field levels.
Outsourced personnel must learn security rules and regulations, pass data security exams, receive special approval on data use, and sign confidentiality agreements.
Permissions are automatically revoked from employees who resign.

Data Middle Office also supports data security control. New network-related laws, such as Electronic Commerce Law and Cybersecurity Law, are enacted almost every year in China. For example, recently the draft of the new Data Security Law was published which retailers will soon need to abide by. Compliance with these laws is particularly important for retailers.

As the largest data input and output platform of Alibaba Group, DataWorks provides a lot of data security control methods that can be used to control data security at various levels, such as the engine, project, table, and field levels. For example, each field is assigned a security level. Only department heads or the users who have obtained approval from senior executives can use fields that are assigned high security levels. In addition, the Data Security Guard module of DataWorks provides the data masking feature to protect sensitive data, such as ID card numbers and mobile phone numbers. This feature masks sensitive data but does not affect data collection or analysis.

Alibaba Group supports centralized data management. The data management platforms are interconnected with organizational structures. When an employee resigns or transfers to a different post, their permissions are automatically revoked. This effectively prevents data security risks caused by personnel changes.

5. Use DataWorks-based Data Middle Office to support business

Figure 19. How does Data Middle Office support business?

In this part, I would like to discuss how a retailer can use Data Middle Office to run their business. It is not easy for retailers to explore the deeply-buried values of data. Most retailers focus only on data. They identify and analyze issues and make decisions based on the data they have. However, the new retail market is growing explosively. For example, a retailer opens more than 100 stores a year, covering 200 plus cities in China. Obviously, simple data reports and data visualization methods are no longer capable of supporting the large business scale. The retailer needs to use refined data management approaches, such as commodity category diagnosis and stock health check, to proactively detect issues, instead of providing only data reports for business personnel to analyze.

For example, fresh food business is particularly susceptible to natural factors, such as weather, holidays, or even traffic accidents. Fresh food retailers can develop some predictive applications based on Data Middle Office to predict fresh food sales and prevent damages that can be caused by stock problems.

The retailers can configure a short prediction cycle for fresh food, such as one hour. They can also design some simulation systems that can predict or perceive risks upon changes, such as sudden changes in weather, and make adjustments based on the risks. Many categories of fresh food need to sell out on the day. Otherwise, the food is no longer fresh. Operators and salespersons of large retailers cannot efficiently perceive sales risks and make adjustments.

Instead of identifying issues based only on data reports, what the retailers should do is to focus on their business systems and develop BI- and AI-based applications on Data Middle Office. This way, if Data Middle Office predicts that some fresh food cannot be sold out on the day, it automatically makes a discount 3 hours before the closing time. This process is achieved based on algorithms, and no manual intervention is required at all. BI- and AI-based applications can help retailers explore the real values of Data Middle Office. Retailers can design various data applications based on their business requirements, which enables data to truly empower business.

Alibaba Tech

First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.

How Do We Set Up Data Middle Office Based on DataWorks?

Written by Alibaba Tech

No responses yet