The age of big data and AI is bringing smarter and smarter search engines that make the experience of searching the Internet ever more powerful and personal. Most devices now incorporate voice assistants that can perform Internet searches on their users’ behalf.
Also known as chatbots, voice assistants have proven a key platform for the development of intelligent human-computer interaction. As defined in this overview by a joint US-Netherlands research team, in a simple human-computer interaction machines simply act on instruction dictated to them by humans, while in an intelligent interaction, a machine seeks to understand the human communication and the context of that communication before responding accordingly.
Shenma web search (神马搜索)— Alibaba’s mobile web search engine — is an example of intelligent interaction applied in a smart web search context. Far from simply parsing speech and feeding it into a search engine, Shenma seeks to infer what the user wants from what they say and how they say it, and then draws on Alibaba’s vast underlying information services system to provide targeted and differentiated search results.
As well as representing a huge advancement in personal voice assistant technology, the Shenma infrastructure now supports more intelligent information services for the entire Alibaba Group.
Content-based Information Services
Information services are a combination of data, algorithms, and architecture. Alibaba’s information services architecture is outlined below.
As shown in the above figure, information services are divided into:
· Skills library
· Knowledge-Graph library
· Question and answer (QA) libraries
· Chat Corpus
This forms the basis for an intelligent interaction service which utilizes different libraries for different search queries.
The first stage of creating the industry skills library included structuring the Search Engine upgrade of over 100 vertical industries. This included large-sized industries like big entertainment, big travel, and news and information, medium-sized industries like automobile, sports, and tourism, to small-sized industries like the stock market, translation, and ancient poetry.
The second stage included carrying out the further structuring of skills upgrades, accurate query structuring and constructing multiple-round dialogs.
This is the web knowledge graph of the whole Alibaba network. It provides outputs in the form of knowledge cards, entity recommendations, and precise questions and answers.
· Community QA library based on UGC QAs at 1B doc level
· UPGC production that organizes, processes and audits stock knowledge and improves the production efficiency and quality of QAs
· Machines to clean the community QA library and extend the socialized production library, resulting in a higher-quality library
· Core library operations and mining procedures to purify the Internet environment and improve the content quality
Universally, information services place particularly strong emphasis on the coverage and quality of QAs. The main industry challenges of this include such issues as semi-structured/unstructured data processing, content production modes, content-sensitive issues, and user satisfaction.
The Shenma search engine uses a multi-level QA system named “MOPU” (Machine/OGC/PGC/UGC),wich employs diversified production and process- and scale-oriented and sustainable production systems.
The combination of the skills library, knowledge library, QA library and a chat corpus provides the infrastructure for intelligent interaction.
Intelligent interaction analyzes user queries to identify which library is the best to procure search results from. For example, a student watching an NBA game may have queries on different aspects of the game, in which case their questions would be dealt with as follows:
· “How far are the Rockets leading by now?” → Skills library
· “Who invented basketball?” → Knowledge library
· “Can Harden enter the Hall of Fame?” → QA library
· “Let’s talk about NBA” → Chitchat library
The figure below shows the system architecture behind Shenma intelligent interaction. The content infrastructure described in the previous section is the second level from the bottom, with the bottom layer being the content itself.
Moving up there is the engine and platform layers. The engine layer is responsible for the construction of data and the hosting of calculations, while the platform layer is responsible for closed-loop solutions that are built with the engine at its core (i.e. production, multi-tenant consumption, operations, and demand management).
Platform Layer: Arrival Platform
The “arrival platform” is a platform extension of the TaskBot engine that is used to solve problems such as debugging, consumption, and operations.
For external developers the platform includes the BotFramework, while for external callers it includes the starting point for the entire Shenma intelligent interaction. For internal RD, it includes the production and operation platform.
The architecture of the arrival platform and how it interacts with other layers of the system architecture is shown below.
The arrival platform consists of the following components:
· Open Conversational Platform
· Debug tools
· Statistical analysis
· Operation management
The skill open component incorporates two levels, associated with two roles:
1. The BotFramework level, which supports the ability to open
This is a skill construction benchmarking api.ai that enable external developers to build their own skills.
2. OpenAPI, which supports the content consumption ability
This is intelligent interaction conducted directly through API by means of the creation of applications and selection of skills/QAs.
Although BotFramework has not been pushed to external applications, there are many open platform products that are unable to meet the needs of developers. Such systems merely submit a corpus, configure certain contexts, and give outputs, which cannot even meet the requirements of simple controls.
Shenma, on the other hand, undertook 300+ different intents under the 20+ skills that were completed in the first skills phase. The second such phase includes processes for collecting, annotating, reviewing, modeling, and testing the corpus. Therefore, Shenma’s focus is primarily on creating real-world built-in skills that provide real value.
Debugging tools are used to produce built-in skills, with the consistent role as the skill open component to deliver materials to the TaskBot engine. However, the user is the internal RD that covers the full-link process from PRD to putting the skills online. This involves the online preparation of structured PRD, demand management, corpus management, entity management, skill construction, skill training, skill verification, and skill release.
In order to ensure the adaptability of each skill, multiple scenes are supported through the skill set, including standard screenlessness, cellphone screen, and large screen. In addition, the built-in skill materials support not only entities, corpus, and scripts, but also the delivery of C++ dynamic libraries to allow different sorting strategies and NLG strategies. Furthermore, putting skill construction online ensures the realization of the division of work between PD/RD/QA/operations and the clarification of pipeline production.
This is used for multi-dimensional point statistics, reports, and indicators analysis. The issues involved include production and consumption efficiencies (guiding content production through statistics), content control and feedback, and calls of overall and independent skills.
The operation management component is divided into two parts. Content operation is the real-time interference of key domains and modules, while application operation is the addition, deletion, modification, and query and training of applications/skills.
The engine layer consists of 1) the TaskBot engine, 2) the QABot engine, 3) the Chit-Chat Bot engine, and the RouteBot engine.
1) TaskBot engine
The TaskBot engine is the core of skill construction and consumption. As part of the engine, offline computing constructs the external platforms’ materials into corresponding internal data, including entity dictionaries, classification models, intent recognition & slot plug-ins/patterns/models, NLG strategies and templates, DM script plug-ins, US sorting plug-ins, and webHook logic plug-ins.
Content management of the above data is undertaken by application/skill versioning. It is stateless and can be quickly transplanted, rolled back, and distributed.
Scheduling is divided into data scheduling, environmental management, and service management, and is responsible for data distribution from offline to online. A set of SDS engines contains multiple roles, each of which loads the corresponding data. Environment management is responsible for the automated management of iteration, verification, pre-release, and production environment. Service management is responsible for operation and maintenance, including row and column division (row division is based on application flows, and column division is based on skill consumption), and capacity extension and reduction going on and offline. Besides these important factors, online engines are also a critical part of TaskBot engines.
The SDS online engine is the core task of the task-oriented dialogs. It accepts user queries, uses DM as the control center, NLU as the understanding center, and recalls and ranks through the US to output after being packed by NLG.
An overview of the TaskBot engine illustrating how the online engine integrates with the other engines is shown below.
Dialog Manager (DM)
The dialog manager is the key part of the dialog system and is responsible for maintaining the dialog context, managing dialog process, and ensuring the dialog fluency. After a user input is processed by the NLU, information such as intent and slot position is generated. The DM makes corresponding decisions and actions based on this data and the context of the current dialog, including calling the NLG module to generate natural language and obtain additional information needed in the dialog through external service interfaces.
DM manages dialogs by means of a task tree. Each node of the tree is an agent (query, execution, and response). Considering the versatility and extensibility of the dialog system, Shenma clearly separates the dialog engines and domain related parts in the design of the dialog management module, so that agents with different functions can be defined and different dialog scenarios can be implemented easily.
Dialog execution stack
The execution state of an agent is maintained in the form of a stack and is controlled according to the context of the dialog process. The dialog stack puts an agent in the stack, and the agent on the top of the stack executes and selects the appropriate sub agent into the stack, and then the execution continues.
The dialog stack stores the context information of the dialog, corresponding to a specific dialog scenario. The agent on the top of the dialog stack can be visually understood as the focus of the dialog. The dialog stack, combined with the agent relationship tree and the topic agenda table, can achieve the tracking and management of the dialog focus, and maintain, switch, and backtrack the dialog topics flexibly.
Topic agenda table
This table is responsible for maintaining and managing the parameter information in the dialog process and is used to collect user inputs desired by the system. The agenda is divided into multiple levels, where each level corresponds to an agent in the dialog stack. Therefore, for different execution stack information, the agenda table represents the desired input in this dialog scenario. When the user remains or shifts a topic, the corresponding desired parameter can be found and updated.
The execution unit of the DM is a “script” and the script tree that the user constructs by dragging and dropping on the open platform or production platform is eventually constructed as a “C++ so” to be loaded and executed. At present, through the combination of DM and NLU, multiple rounds of dialogs such as omission replacement, referential resolution, topic transfer, and error handling have been completed on multiple skills.
The NLU has two different design concepts that focuses on the BotFramework and the dialog products.
Focusing on the BotFramework allows the user query to be structured as a Domain/Intent/Slot and returns it to the developer. Some BotFramework products require the user to decide whether to accept the result.
Focusing on dialog products gives multi-dimensional NBest strategies with NLU classification and recalled results that have been taken into consideration, which is particularly important in information service scenarios. Recent developments in this are the NBest upgrade of the entire SDS and the sub-NLU.
2) QABot engine
Within the industry classification dimensions for QAs can be divided into structured data QAs, unstructured data QAs, and QA pair-based QAs by content dimensions. From a technical point of view, the industry generally divides the QA system into retrieval-based systems and production-based systems. The former is to construct an information retrieval system on a large-scale dialog dataset and to achieve a reasonable response to user problems by establishing an effective question matching and QA relevance quantifying model. The latter attempts to construct an end-to-end deep learning model to automatically learn the semantic relationship between queries and responses from massive dialog data so as to achieve the purpose of automatically generating a reply for any user question.
Currently, Shenma mainly focuses on retrieval-based QA systems created with massive data, which at the system level are divided into knowledge graph QA (KG-QA), DeepQA, and PairQA. These components handle and neaten existing knowledge, but each have differences in data sources/requirements, processing methods, matching methods, and coverage scenarios. An outline of this retrieval-based QA system is shown below.
KG-QA exhibits high accuracy, but limited coverage. It is powered by the knowledge graph, which is the core infrastructure of Shenma search engine. The knowledge graph is built using search big data, natural language processing, and deep learning technology.
PairQA and DeepQA
The unstructured Deep-QA has a wide coverage but high pollution. The social production of Pair-QA significantly improves productivity but requires good scenarios and questions. Therefore, many challenges determine the difficulty and barriers of QAs at each part of the process.
User query comprehension
This is a key part of the QA system to understand user intents, especially for DeepQA. NLP capabilities, including semantic extension, weight analysis, entity recognition, rewriting and error correcting, are used as the big search basis. The question classification is realized by combining machine learning classification algorithms and manual methods.
Focus word recognition is mainly to accomplish the precise positioning of information needs. The main background or object and the content of related topics of specific interrogations can reflect the descriptive role of the topic, such as entities, attributes, actions, and instances.
Information retrieval is responsible for retrieving relevant/candidate information from the global corpus and passing it to the answer generation module. Different corpus and business scenarios cause search methods to be in multiple forms.
At present, Shenma mainly uses inverted rank-based text retrieval and vector-based semantic retrieval. The former is a traditional approach adopted by full-text search engines, which is advantageous in terms of simple implementation and high accuracy. However, it is heavily dependent on the construction of corpus database. The latter is a better implementation for semantic search engines with a strong generalization ability but has a relatively poor false trigger rate. Combining with different corpus and business scenarios, Shenma uses different retrieval mechanisms in conjunction with their respective advantages.
Candidate answers based on the retrieval require further refinement, answer extraction, and confidence calculation to finally reach an accurate and concise answer. PairQA focuses on strict ordering and confidence calculations through machine learning models and methods, such as CNN, DSSM, and GBDT. DeepQA is aimed at unstructured documents/community corpus and requires deeper processing, including combining with the concise abstract extraction of the Bi-LSTM RNN model, cross-validation between answers to synonymous questions, and verification of the relevance of the answer.
The corpus construction is the basis of QABot. Whether it is for QAs in a specific area or an open area, it is inseparable from the support of a high-quality corpus. Shenma implements a whole set of data mining and operational production processes for colloquial QAs, including open question mining, scene question mining, socialized answer production, and automatic extraction of high-quality answers.
3) Chit-Chat Bot engine
The most important aspect of the engine layer is the chit-chat corpus, which is formed from internet crawling and OGC/UGC. The internet corpus is filtered by LM model and tagged by domain, characters and emotions.
Responses are obtained by means of a traditional retrieval-based model. End-to-end dialogue systems are still relatively new, and most such architectures are far from ready for industry deployment.
Shenma’s technical upgrade from search to intelligent interaction, a process which began in 2017, is now complete. In the process of developing Shenma web search, the team have developed a new AI+ information service architecture, algorithms, operations, and content system.