This article is part of the Academic Alibaba series and is taken from the paper entitled “Visualizing and Understanding Deep Neural Networks in CTR Prediction” by Lin Guo, Hui Ye, Wenbo Su, Henhuan Liu, Kai Sun, and Hang Xiang, accepted by SIGIR 2018. The full paper can be read here.
Deep learning using deep neural networks (DNNs) is a popular research area within the field of AI, and the technology already enjoys wide application in computer vision, natural language processing, e-commerce and other areas. But one curious aspect of this technology is that although its operational principles are well-understood, it’s difficult to know exactly what the deep learning mechanisms look like in practice in between the input and output stages.
In other words, we know what to ask the system, and we can see its answer. But unless we probe a little further, we don’t quite know what it is thinking.
Pondering Photos, Phrases, and Click-through Rates
It is the layered architecture of DNNs that makes them so useful. Data is processed with the goal of extracting and identifying meaningful features at increasing levels of abstraction or generalization at each layer of the network. This enables them to draw conclusions on or make predictions about other data.
In image processing, this means identifying objects in new photographs, and in natural language processing this means the ability to decode and comprehend human speech. In the e-commerce industry, one way Alibaba is employing this technology is to make click-through rate (CTR) predictions. This means predicting the likelihood of users clicking on an ad or product recommendation, which is used to present more relevant ads and make better recommendations.
Given that the revenue of many multi-billion-dollar businesses relies heavily on the performance of their CTR prediction models, analyzing these models to illuminate their internal mechanisms is crucial to being able to refine them going forward. Recently, much work has been carried out in the fields of image processing and natural language processing on analyzing and visualizing the internal mechanisms of DNNs. Now, the Alibaba tech team has done the same for their CTR prediction model.
Their findings offer a wealth of information on how the system identifies and weighs features and feature groups. They also provide a means of assessing the system’s performance and diagnosing issues based on each step of the process, rather than purely on the accuracy of the final output.
System Performance by Layer
Alibaba collected data sets of around 150 million samples each from their e-commerce site set over an eight-day period, plus an additional training set sampled on day one separately from the test set. To investigate the decay in the model’s performance, the model was evaluated daily from day one to day eight.
Among the various methods used by the team to analyze the system performance during the “hidden” stage of deep learning, a probe approach was implemented to measure the layer-wise performance.
Analysis and visualization of the outputs of each layer revealed some surprising insights. Deep learning theory dictates that, for a properly trained DNN model, the discriminative quality of a hidden layer’s output increases with the height of the layer. However, although layer 3 is an obvious improvement over layer 2 in the above visualization, the clicked points for layer 4 show no improvement in the degree of concentration over layer 3.
Naturally, this raises the question of whether the features extracted for making final output are more or less predictive than those at one of the intermediary stages. It is exploring questions like this, the team concludes, that make opening the “black box” of deep neural networks worthwhile endeavor. They can provide the route to diagnosing issues such as underfitting/overfitting, gradient vanishing/explosion, and ineffective model structure as well as designing better model structures, training algorithms, and features.
The full paper can be read here.