This article is part of the Academic Alibaba series and is taken from the paper entitled “Exact-K Recommendation via Maximal Clique Optimization” by Yu Gong, Yu Zhu, Lu Duan, Qingwei Liu, Ziyu Guan, Fei Sun, Wenwu Ou, and Kenny Q. Zhu. The full paper can be read here.
In online information retrieval, recommender systems play the vital role of narrowing down an enormous range of potential search results (or “k items”) by using metrics like click-through rates to predict their relevance. While in scrolling list scenarios like web searches this is best done with traditional top-k recommenders that rank items individually, anyone who has recently visited YouTube or a mobile-based shopping app has also interacted with another, more challenging recommendation model in which systems try to maximize the performance of a whole set of items displayed together as a card.
Now, in a novel effort to better define and tackle these challenges, researchers at Alibaba have proposed exact-k recommendation to effectively address the constraints of combinatorial optimization problems. Innovatively, the team’s work reduces the exact-k recommendation problem to a graph-based maximal clique optimization problem, which it then solves through a Graph Attention Networks (GAttN) model based on an encoder-decoder framework that learns the optimal distribution for k items in a card in an end-to-end manner.
Transferring the Problem, Adapting a Solution
Based on the requirement that a recommender choose the exact combination of items that will maximize the chance of a user clicking it, exact-k can be clearly defined in technical terms as a constrained combinatorial optimization problem. One way of depicting this problem is in the form of a graph in which each node represents a recommendable item, meaning that it can be transferred to the graph-based maximal clique optimization problem wherein a clique (i.e. set) is selected among the available nodes based on specific constraints that maximize performance under a given set of criteria.
To locate the highest performing clique in a graph of k nodes, Alibaba’s researchers developed a novel GAttN with attention mechanism, which they then trained with a Reinforcement Learning from Demonstrations (RLfD) approach that combines the respective advantages of behavior cloning and reinforcement learning. In an adaptation of traditional encoder-decoder frameworks that encode an input sentence and then decode and output sentence using recurrent neural networks, the model’s encoder produces representations of all input nodes in a graph while the decoder selects a clique from among these nodes. The model’s RLfD agent, based on concepts originally developed for robotics applications, significantly reduces the number of samples and total time required for training the process through a combination of Learning from Demonstrations and Learning from Rewards methods.
In experiments, researchers sought to determine whether the GAttN with RLfD method could outperform baseline methods in exact-k recommendations scenarios. Further, they sought to explore the precise workings of the GAttN framework and the effectiveness of RLfD optimization toward improving the training process.
Using two datasets based on the MovieLens movie rating dataset and one dataset built from real-world Taobao e-commerce data, the team compared models’ performance in terms of Hit Ratio and Precision metrics. Results indicate that the proposed model outperformed baselines by 4.7% in Hit Ratio and 7.7% in Precision. Analysis of the model’s components indicates that adjustment of the beam size (or number of solutions sought in each decoding time-step of the process) can greatly improve the GAttN’s performance in both metrics. Additionally, the team found that the RLfD component was able to ensure both the sufficiency and efficiency of the model’s training.
The full paper can be read here.