It’s a Match! Optimizing Item Recommendations in Ecommerce

This article is part of the Academic Alibaba series and is taken from the paper entitled “Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba” by Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee, accepted by KDD. The full paper can be read here.

Recommendation, which aims at providing users with attention-grabbing items based on their preferences, is a key technology in Alibaba’s e-commerce site Taobao. The homepage of the Mobile Taobao app, shown below, is generated based on users’ past behaviors with recommendation techniques.

The highlighted areas (dashed) are personalized for one billion users in Taobao

Recommender systems (RS) seek to predict the preference users’ would give to an item, making the development of RS capabilities key to driving better recommendations and boosting product sales. But with data from one billion users and two billion items to process, Taobao’s RS system is already under tremendous stress, facing major challenges of scalability, data sparsity and cold starts.

Alibaba’s tech team decided to focus on alleviating these problems while optimizing the first stage of the recommendation process. Their new RS model, dubbed Enhanced Graph Embedding with Side information (EGES), tackles the problems outlined above and pioneers a graph-embedding approach for recommendation, capturing higher-order similarities between items and making for more successful recommendations.

Item Matchmaking in RS

Taobao’s RS is divided into two stages. The first stage is matching, where the goal is to generate a candidate list of similar items to those the user has browsed previously. The second stage is ranking the candidate items for each user, according to his or her preferences.

At the matching stage, many current RS approaches use collaborative filtering (CF) to compute item similarities based on the co-occurrence of items in users’ behavior history. Alibaba’s graph-embedding approach uses item graphs constructed from users’ behavior history. These capture the sequence of user behavior in relation to these items, rather than simply co-occurrence, shedding extra light on users’ item preferences.

It is not possible to use a user’s whole behavior history due to the high cost and the fact that a users’ interests tend to drift with time. To counter this problem, the team limited the item graphs to behavior occurring during a specific time window (“session-based” behavior). They also eliminated noise from the dataset by removing user behavior deemed as unintentional, spam or erroneous.

From BGE to GES to EGES

After constructing weighted item graphs, Base Graph Embedding (BGE) was used to learn the embeddings. First, DeepWalk is used to learn the embedding of each node; then, the Skip-Gram algorithm learns the embeddings to maximize the co-occurrence probability of two nodes in the obtained sequence.

BGE cannot learn accurate embeddings for items with few or even no user interactions, so the team used item side information (category, price, shop, and so on) to mitigate the cold-start issue for such items, dubbing the modified system Graph Embedding with Side information (GES).

GES puts items with similar side information closer in the embedding space. For example, a person who likes a Nikon lens may also have an interest in Canon Camera equipment because they are similar in category and brand.

Similar items for ‘cold start’ items (note cat = category)

With GES, the problem remains that different kinds of side information contribute differently to the co-occurrence of items in users’ behaviors. For example, iPhone owners typically display greater Apple brand loyalty in their item purchases, while other users may buy different-branded clothes in the same Taobao store for convenience and economy. Enhanced GES (EGES), the final form of the team’s solution, combats this issue by using a weighted average layer to aggregate the embeddings of the side information related to the items.

Weights for different side information of various items (here “item” means the embedding of an item itself)

Game Set and Match!

In online tests, EGES and GES consistently outperformed both BGE and Base in terms of CTR, while EGES also outperformed GES.

Online click-through rates of different methods over a seven-day period in November 2017

The full paper and results can be read here.

Alibaba Tech

First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.




First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Behavioral Analytics

Attorney in Utah

attorney in utah

Day 7 of 100DaysofML

Tableau: Measures vs. Dimensions

An Introduction to Data Collection: Pulling OpenAQ Data from S3 using AWS Athena.

Intro to Data Science with R: Part 1

A Simple Approach To Building a Recommendation System


Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Tech

Alibaba Tech

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

More from Medium

User Micro-behaviors Session-based Recommendation Systems

Interactive Product Search Engine: Insights from a ML Engineer

Your Rediscover Past: Proposing a New Personalized Playlist for Spotify

Multi-Armed Bandits at Swiggy