Knowing Your Adversary: How Alibaba is Training Smarter Online Fraud Detectors
This article is part of the Academic Alibaba series and is taken from the paper entitled “Adversarial Detection with Model Interpretation” by Ninghao Liu and Xia Hu and Hongxia Yang, accepted by KDD 2018. The full paper can be read here.
With a growing list of real-world applications, machine learning (ML) systems have recently gained traction as a promising tool for online fraud detection. Where most ML tasks process stationary data sets, however, fraud detection works against intelligent human actors who are able to adapt when exposed, meaning it is the ML models which are liable to become stationary amid shifting data sets.
Previous efforts to build detectors more resistant to changing adversaries have relied on methods such as classification and feature recognition enhancement, rote adversarial training, and deep neural network applications. Each of these comes with its own limitations, but one major recurring issue is the “black box” conundrum. Researchers are typically unable to access detailed information on the inner workings of these methods, denying them critical insights which could help them develop them further.
Now, researchers at Alibaba have developed an approach to adversarial training based on research into the workings of ML models, applying knowledge of their mechanisms to generate more formidable adversaries for them to train with. As well as improving the wholesale robustness of detectors, the effort to generate challenging new adversaries from a minimum of computing resources has helped shed light on the ways real fraud perpetrators are likely to adapt from a given position after becoming detectable. Founded on the premise that spammers are fundamentally human agents with limited resources to spend as they adapt, the approach is showing how machines can be trained to anticipate the “direction” of future attacks based on the positioning of previous attacks.
Cracking the Black Box
ML fraud detection systems rely on classifiers to filter content, treating some spammer instances as high-confidence and others as low-confidence instances depending on the probability they have been correctly flagged. A classifier becomes vulnerable to attack when a high number of spammer instances falls into its low-confidence regions, as even small changes to such evasion-prone (EP) samples on the part of spammers could lead to their misclassification as legitimate content.
For research purposes, the Alibaba team treated these EP samples as “seeds” for generating formidable adversarial samples to use in detector training. They then sought to identify the directions seeds could be most readily influenced to escape the classification mechanism and thus bypass detection with minimal effort. Using these findings, they generated adversaries closely mimicking real malicious behavior, reducing the overall number of adversaries needed to train stronger detection mechanisms.
By using knowledge of the ML mechanism to analyze its response to each adversary introduced along its boundary, the team derived a local interpreter function specific to each instance. With this information, they updated the overall mechanism to better account for the evolution of individual spam threats introduced during the trials.
The new system identifies evasion-prone samples, analyzes potential evasion tactics, and updates the detection model accordingly
Based on experiments with YelpReview and Twitter post data, careful adversary selection proved effective in generating more challenging attack simulations, providing a wealth of information about weaknesses in classifiers during trials. These findings were then used to develop effective defensive strategies for a variety of attacks, strategies that could be used to bolster the effectiveness of detection frameworks in future.
Moving forward, Alibaba is looking to extend their spam detection work to deal with high-dimensional raw data and data sets with relational links between instances.
The full paper can be read here.