This article is part of the Academic Alibaba series and is taken from the paper entitled “Speeding Up the Metabolism in E-commerce by Reinforcement Mechanism Design” by Hua-Lin He, Chun-Xiang Pan, Qing Da, and An-Xiang Zeng, accepted by the ACM SIGIR Workshop on eCommerce 2018. The full paper can be read here.
Like traditional brick-and-mortar retail, e-commerce thrives on competition among merchants and vendors seeking to draw customers to their products. A unique feature of e-commerce, though, is that the platforms where online vendors compete — platforms like Jeff Bezos’ Amazon and Alibaba’s Taobao — are themselves interested parties, with resources they can award where they believe it will generate the most value.
Most notably, platforms offer visibility to products they believe will sell well by allocating them a greater number of consumer impressions, specifically by giving priority display to products with high click-through rates (CTR). Since platforms make their earnings as a portion of each sale, being able to effectively allocate impressions means being able to maximize the revenue a given product will bring through their marketplace, not only immediately, as Alibaba researchers are now showing, but over its entire lifetime.
To enable a smarter mechanism for impression allocation, members of Alibaba’s tech team developed a new reinforcement learning mechanism to optimize impressions allocation at each phase of a product’s lifecycle: introduction, growth, maturity, and decline. Based on evaluations in a simulated e-commerce environment, their work shows that being able to recognize hot products early in their introduction and predict their declines as they age can help platforms increase the “metabolism” of consumption by allocating impressions strategically over time.
Among other challenges, the team overcame an allocation bias against introduction-stage products by departing from conventional supervised learning models for CTR optimization. Under these models, the low volume of CTR data available for new products can be mistaken as a sign of low performance, reducing the likelihood they will be picked for impression allocation strategies that could increase their metabolism in the system. More generally, supervised learning approaches tend to improve short-term CTR while failing to account for many of the factors affecting long-term performance, such as changes in commercial activity over time and the properties of platforms themselves. Given the scale of platforms like Taobao that host millions of customers, enterprises, and service providers, methods which oversimplify these variables tend not to scale well in real application scenarios.
For its new approach, the Alibaba team first developed a mathematical model for product lifecycle stages and transitions that it applied to its algorithms, integrating abstract economic theory with a basis in computation metrics. It then applied these terms in a reinforcement learning framework designed to maximize short-term and long-term returns for its simulated platform, incorporating first principal component-based permutation with a novel experience generation method to meet the scalability requirements of e-commerce scenarios.
Driven essentially by trial-and-error, the mechanism allowed the platform to observe global information for all products, allocate impressions according to these observations and a given strategy, and then update itself with new attributes for products and their lifecycle positions. The mechanism then received feedback on the effectiveness of its actions, allowing it to adjust its strategy before repeating the procedure. With the ultimate goal of bringing products to the mature stage of their lifecycles as quickly as possible, the mechanism proved able to improve the long-term effectiveness of impression allocation and generate returns for the platform more efficiently than competing approaches.
The full paper can be read here.