This article is part of the Academic Alibaba series and is taken from the WSDM 2019 paper entitled “Learning to Selectively Transfer: Reinforced Transfer Learning for Deep Text Matching” by Chen Qu, Feng Ji, Minghui Qiu, Liu Yang, Zhiyu Min, Haiqing Chen, Jun Huang and W. Bruce Croft. The full paper can be read here.
When venturing into uncharted territory, one peril is certainly the unknown. Just as formidable, though, is the danger of relying on knowledge established in other environments that may help in some cases while hurting in others — a danger known in the machine learning world as negative transfer.
In text matching systems, negative transfer is a manifest risk of Transfer Learning (TL) methods that tackle the shortage of labelled data in one domain by importing data from resource-rich domains. Doing so is not simply a shortcut to performance but a real requirement of everyday applications, given the profusion of small, category-specific domains (such as individual product types, in the case of e-commerce). Unfortunately, a clear mismatch between existing selection methods and recent deep transfer models has emerged due to the difficulty of training them together, so far limiting efforts to effectively integrate source data selection into TL.
To counter this problem, researchers at Alibaba have now proposed a novel reinforced data selector that works within source domains to find a subset for optimizing the TL model, after which the TL provides reward feedback to update the data selector. In experiments, the resulting Reinforced Transfer Learning (RTL) model significantly improved the TL mechanism’s performance in key criteria for paraphrase identification and natural language inference tasks, breaking new ground for a broad range of applications including document retrieval and question answering.
A Well-Reinforced Framework
The proposed RTL framework is comprised of a base model, a transfer learning model, and a reinforced data selector, each representing a key subtask.
The base model, responsible for text matching, is a Decomposable Attention Model (DAM) shared neural network chosen for its efficiency. Within it, three jointly trained components align pairs of input sentences, compare them, and produce a representation of the sentence pair, respectively. Above it, the transfer learning model leverages a large amount of source domain data using a deep neural network (DNN) framework with a fully-shared encoder. Finally, the reinforced data selector completes data selection from the source domain, working as an agent for preventing negative transfer by keeping or dropping a given sentence-pair source sample based on a learned policy. Afterwards, the TL model evaluates the agent’s decisions and provides a reward for favorable selections, allowing the agent to pursue the singular goal of maximizing its future expected total rewards.
As a whole, the RTL framework can be viewed as essentially having two parts: the reinforced data selector and the TL model, with the base model embedded in the TL model. These are learned jointly, interacting closely during training.
To evaluate the proposed model, researchers designed paraphrase identification (PI) and natural language inference (NLI) tests to simulate transfer from a relatively open domain to a relatively closed one. As well as the RTL framework as a whole, each of its components was evaluated against a rigorously selected competitor model.
Results indicate that the proposed model was able to generate a statistically significant improvement over the strongest baseline in the PI task, while it outperformed the strongest baseline by a large margin in the NLI task. In future work, researchers will explore how more effective state representations can adapt the demonstrated methods for other tasks.
The full paper can be read here.