Building Relationships in E-commerce — Why Our IDs Need Company
This article is part of the Academic Alibaba series and is taken from the paper entitled “Learning and Transferring IDs Representation in E-commerce” by Kui Zhao, Yuechuan Li, Zhaoqian Shuai, and Cheng Yang, accepted by KDD 2018. The full paper can be read here..
Many machine intelligence techniques have been developed in e-commerce, one of the most essential of these being the representation of IDs. Here, IDs refers to the host of different actors and objects in the e-commerce ecosystem. The ability to properly analyze the connections between user IDs, item IDs, product IDs, store IDs, brand IDs, category IDs, and others is one that would bring huge potential insights into customer behavior and industry trends, making it possible to improve marketing and advertising, operations, and customer retention rates.
However, existing ID encoding-based methods are inefficient as they suffer sparsity problems due to high dimensions, and they cannot reflect the relationships among IDs, either homogeneous or heterogeneous ones. To combat this, the Alibaba technical team are proposing an embedding-based framework to learn and transfer the representation of IDs.
Big Data, Big Problems
E-commerce has become an important part of our daily lives with the increase in online shopping. However, the e-commerce business environment is much more dynamic and complex than traditional commerce, and in many ways is still not fully understood. Due to the wealth of data available on e-commerce activity, machine learning offers an effective way into analyzing and comprehending e-commerce; however, IDs are a key aspect of this data goldmine that are currently poorly represented in machine learning methods.
Current ID representation methods have two main limitations. Firstly, they suffer from data sparsity problems due to the enormous and growing quantity of data. The number of samples needed to make the statistical models increases exponentially as the number of IDs increase. Secondly, they cannot reflect the relationships among IDs, either homogeneous or heterogeneous ones.
Using the current method, if one takes two different item IDs (a homogeneous example), they have a constant distance regardless of whether they are similar or not. Meanwhile, the relationship between an item ID and a store ID (a heterogeneous example) cannot even be measured since they are in different spaces.
Bringing IDs Together
The Alibaba team are improving on these current methods by using an embedding-based framework to learn and transfer representations for all types of IDs. They also consider the structural connections between item ID and other types of IDs (as illustrated above). Through these connections, the information indicated in item ID sequences can propagate to other types of IDs, and the representations of all types of IDs can be learned simultaneously.
In their proposed framework, all types of IDs are embedded into one space, where the relationships among IDs — both homogeneous and heterogeneous — can easily be measured. This makes it more convenient to use and control this data in real-world scenarios and in many applications.
The problem of cold starts is inevitable with new items, meaning item IDs with no historical records are invisible to the recommendation systems. To alleviate this, approximate embedding vectors are constructed for new item IDs by transferring the embedding vectors of seen IDs. What complicates the matter further is that a high proportion of users are new customers, making personalized recommendations especially challenging. Using the team’s method, embedding vectors of user IDs are constructed by aggregating embedding vectors of item IDs. This means that these vectors can be transferred from long-existing platforms like Alibaba’s Taobao onto emerging platforms to provide effective personalized recommendations for new users.
The team are already planning improvements to their approach and extending it to many other applications such as search engines and advertisements.
The full paper can be read here.