Rising through the Ranks

Multi-scenario search result ranking is better at finding users what they want, however they’re looking for it

This article is part of the Academic Alibaba series and is taken from the paper entitled “Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinfor­cement Learning” by Jun Feng, Heng Li, Minlie Huang, Shichen Liu, Wenwu Ou, Zhirong Wang, and Xiaoyan Zhu, accepted by The Web Conference 2018. The full paper can be read here.

Image for post
Image for post

Most large-scale online platforms or apps incorporate multiple scenarios in to their system that can include services such as search, advertising, and recommendations. Alibaba’s Taobao, for instance, is a huge Chinese e-commerce platform where users can search for and buy products through querying or bookmarking goods, as well as based on recommendations.

A common feature of these services is ranking strategies that serve as a fundamental function to provide a list of ranked items to users. These strategies focus on specific scenarios that cover different aspects of the way in which users interact with online platforms.

Alibaba’s approach is to apply their new multi-scenario ranking optimization model, and Taobao’s performance in this area has shown a marked improvement over more traditional approaches.

Scenario Ranking

Take the following illustration as an example, showing the competition between two sellers selling snacks on a beach. The top figure shows the initial location, where people in red buy snacks at A and people in blue buy snacks at B. The middle figure shows that when A moves right, and so covers more customers, then more sales can be made. The bottom figure indicates the optimal solution to this non-cooperative game, where the two sellers compete with each other and both are located at the center of the beach. Nevertheless, the total income of the two sellers is not optimal in this scenario as the people in grey are beyond the scope of the sellers.

Image for post
Image for post
Competition between two sellers (A and B) selling snacks along a beach

Not a Team Player

By analyzing the user logs of millions of Taobao users, it was found that there was a 25% conversion from the main search to the in-shop search, while there was only a 9% conversion from the in-shop search to the main search. This type of scenario conversion also occurs amongst search, advertising, and recommendation scenarios, and shows the limitations of independent scenario optimization. Therefore, multi-scenario ranking optimization that collaborates across multiple scenarios is a much more effective approach.

Multi-Scenario Ranking Optimization

Moving back to our previous example again, multi-scenario ranking helps to provide the optimal solution by providing the best locations to both sellers while also accessing the whole market, as shown below.

Image for post
Image for post
Competition between two sellers using multi-scenario ranking optimization

This model formulates multi-scenario ranking as a fully cooperative, partially observable, multi-agent sequential decision problem and incorporates the following:

· A communication component for passing messages.

· Several private actors (agents) for making actions for ranking.

· A centralized critic for evaluating the overall performance of the co-working actors (agents).

In this model, each scenario is considered an actor (agent). Collaborations between actors are undertaken by sharing a global action-value function (the critic) and passing messages that encode historical information across scenarios. In addition, the model’s centralized global critic network evaluates the overall rewards. The overall model architecture is illustrated below.

Image for post
Image for post
MA-RDPG model architecture overview

The sequential process starts when a user enters a scenario and browses, clicks on, or buys goods. The search system (the model) then changes the ranking strategy by adjusting the ranking algorithm when the user navigates into a new scenario or issues a new request. This process is repeated until the user leaves the system, so the current ranking decision affects any future decisions.

MA-RDPG Ranking in Taobao

This case study illustrates that, in Taobao, the main search scenario supports the in-shop search scenario and thereby targets more future overall rewards. The main search with MA-RDPG ranks items from a global perspective, so that not only its own immediate rewards (i.e. a direct purchase) are considered, but also future potential purchases during an in-shop search.

Ranking is a fundamental issue in many applications, and an effective ranking strategy can significantly improve user experience and system performance. By using joint ranking optimization across multiple scenarios, Alibaba has improved performance over traditionally used algorithms for the e-commerce platform Taobao. Further work is needed across other domains to fully realize the potential of this approach.

Read the full paper here.

Alibaba Tech

Written by

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store