This article is part of the Academic Alibaba series and is taken from the paper entitled “Attributed Network Representation Learning via Deep Neural Networks” by Zhen Zhang, Hongxia Yang, Jiajun Bu, Sheng Zhou, Pinggang Yu, Jianwei Zhang, Martin Ester, and Can Wang, accepted by IJCAI 2018. The full paper can be read here.
Information networks — such as social media networks and the World Wide Web — are not just useful for the resources they store. By analyzing an information network using a machine learning network (a process known as network representation learning), a wealth of information can be acquired on how complex relationships work between different nodes of the information network.
A classic example of this in application is online advertisement targeting and recommendation. On Facebook, for instance, a user is often associated with personalized profile information including age, gender, education, and posted content. This data will then be used to give the user targeted advertisements and suggestions (such as groups to join).
However, network representation learning involves mountains of data and high computational complexity. Most research into network representation learning methods to date, therefore, has had to sacrifice information on either the network structure or the individual nodes in the interest of producing a scalable model.
Now, Alibaba’s tech team, in collaboration with researchers from Zhejiang University, China, and Simon Fraser University, Canada, has proposed a new unified framework called Attributed Network Representation Learning (ANRL). ANRL incorporates both the network structure and node attribute information in its analysis of the information, leading to keener insights into how network nodes interact.
Decoding the Node Neighborhood
A key part of the ANRL solution is the neighbor enhancement autoencoder, which retains better similarity between data samples in the representation space. An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction (the process of reducing the number of random variables under consideration, by obtaining a set of principal variables). The neighbor enhancement autoencoder consists of an encoder and a decoder, and the model reconstructs the target neighbors instead of the node itself.
The encoder transforms the input attributes and extends out two output branches. The left output branch is a decoder which reconstructs the target neighbors of its input samples. The right output branch predicts the associated graph context of the given inputs.
This approach possesses an advantage over traditional autoencoders by retaining better proximity among nodes. Intuitively, the obtained representations are more robust to variations, since it constrains closely located nodes to have similar representations by forcing them to reconstruct the similar target neighbors. Thus, it captures both node attributes and local network structure information. In this way, ANRL preserves node attributes, local network structure and global network structure information in a unified framework.
Superior Link Prediction
The team tested ANRL against several state-of-the-art methods on multiple real-world datasets. The test format was link prediction tasks on three unlabeled datasets from Facebook, UNC, and UniID. The results are shown below.
The ANRL-based methods achieve significant improvements in AUC over the baselines in all three datasets. For instance, the tech team’s method achieved about 3.5% AUC improvement over the best performance baseline in UNC dataset. They also observed that incorporating both node attributes and network structure information improved the link prediction performance.
The team explains that one major reason for the performance lift is that their model takes both local and global network structure information into consideration. These experimental results on several real-world datasets show that the proposed ANRL outperforms representative state-of-the-art embedding approaches.
The full paper can be read here.