Chatbot Engine behind Alibaba’s AliMe Customer Service Bot
This article is part of the Academic Alibaba series and is taken from the paper entitled “AliMe: A Sequence-to-Sequence and Rerank-based Chatbot Engine”, accepted by ACL 2017. The full paper can be read here.
Chatbots are certainly having their moment. Amazon’s Alexa and Apple’s Siri lead a pack of thousands of chatbot applications being operated and developed by companies of all sizes. Unlike older, more rudimentary versions where users had to abide by simple and structured language, today’s chatbots allow users to employ natural text and speech (in some cases, even images) when interacting with them. They may not have reached the heights of Samantha in the film Her, but they are making major progress towards realizing that level of AI and AI interactions.
Introducing AliMe, Alibaba’s Chatbot
AliMe, Alibaba’s e-commerce bot, services millions of customer queries a day, most in Chinese, but a good amount in English too. Hundreds of thousands of these are highly conversational, requiring the use of an open-domain chatbot engine that renders a better user experience.
Most open-domain chatbots use information retrieval (IR) or generation models, both of which come with several drawbacks. IR models retrieve answers from question/answer knowledge bases, while generation models return answers based on pre-trained sequence-to-sequence (seq2seq) models. IR models fail spectacularly at dealing with long-tail queries that don’t match the QA knowledge base, and generation models don’t always return comprehensible or consistent answers.
How AliMe Differs and Holds Up
As an open-domain chatbot, AliMe integrates a hybrid approach based on IR and seq2seq generation models. It uses an attentive seq2seq-based re-rank model to optimize the joint results, outperforming standard IR and generation-based chatbots.
When faced with a query, the chatbot first uses an IR model to retrieve a set of Q/A pairs for likely answers, and then re-ranks these candidate answers using an attentive seq2seq model. When the top candidate answer has a score higher than the set threshold, it is chosen as the answer. If this proves inconclusive, the answer is then selected using a generation-based model.
For questions and answers of varying lengths, the Alibaba tech team chose the bucket mechanism first proposed by Tensorflow 1. To speed up the training process for the chatbot, they applied softmax to a set of sample vocabulary rather than an entire set — inspired by the strategy of importance sampling. In the decoding phase, they used beam search to maintain top-k (k=10) output sequences at each moment rather than going with greedy search, making generation more reasonable and consistent.
Already finding proof of its success in the experiment stage, the team tested AliMe against a known chatbot. Using a set of relevant testing questions, the AliMe chatbot displayed a better performance on 37.64% of the questions, and worse on 18.84%.
What Comes Next
There’s still a long way to go before chatbots can rival the intelligence of those in science fiction and film. For now, the next step to explore is context, which determines multi-round interactions in dialog systems.
Currently, Alibaba uses a straightforward strategy for incorporating content. When faced with a question, if the IR model pulls up less than three answer candidates, the model is supplemented with the previous question and the concatenation is sent to the IR engine again. Another method the tech team tried in the past involved using context-aware techniques such as context-sensitive and neural conversation models, but they tended to scale poorly for Alibaba’s scenarios. In the meantime, the team is continuing to explore scalable context-aware methods, as well as personification, making AliMe more engaging and relatable, demonstrating emotions and a personality.
The full paper can be read here.