How CoLink Links Entities Between Different Knowledge Graphs

This article is part of the Academic Alibaba series and is taken from the paper entitled “CoLink: An Unsupervised Framework for User Identity Linkage” by Zexuan Zong, Yong Cao, Mu Guo, and Zaiqing Nie, accepted by the 2018 Conference of the Association for the Advancement of Artificial Intelligence. The full paper can be read here.

Image for post
Image for post

Many entities have information on multiple knowledge graphs, each one giving a different snapshot of the same entity. Users that want to know these entities better can gain valuable insights from combining the information from across these various graphs.

Existing attempts to identify and link matching entities automatically have so far provided mixed results. Now, researchers from Alibaba AI labs, Microsoft and the University of Illinois have developed an approach they call CoLink, which matches entities much more accurately and comprehensively than existing systems.

For a system to identify matching entity profiles on two networks, it must compare the entity information (“attributes”) on both platforms and link profiles with matching attributes. Previous systems have performed this task using string similarity functions, which compare strings of text for similarity.

Since entity attributes are formatted differently on different platforms — “ESCAL ENG” on one platform may correspond to “Escalation Engineer” on another — the string similarity functions require an initial set of confirmed matches to train the system. “Unsupervised” approaches, where the system is instructed to collect training data automatically, usually require tailoring to the specific platforms used, and cannot be generalized to work with all platforms.

CoLink is the first general unsupervised solution. It uses a brand-new framework with two methods of linking profiles and a co-training algorithm to coordinate between them. The first method is a familiar user-attribute-based method, and the second is a relationship-based method, which identifies candidate pairs based on mutually related entities. Each method analyzes profiles and decides whether they should be linked independently an in iterative process, while the co-training algorithm uses high-quality matches from both models to retrain the models between iterations.

Image for post
Image for post
The CoLink co-training algorithm

Instead of a string similarity function, CoLink’s entity-attribute-based method uses a machine translation algorithm to match entity attributes. This approach identifies matching attributes more successfully, and can even identify “implicit connections” — attributes that match because they contain the same information, but bear little or no textual resemblance to each other.

The CoLink framework was tested against other unsupervised approaches, including SiGMa and Alias-disamb. The following table shows the results, with the overall performance calculated as the average of the accuracy and completeness of the match results.

Image for post
Image for post

CoLink yielded impressive results, outperforming the next-best approach by as much as 20%. These results show that CoLink offers a substantially more accurate and comprehensive method of linking entities across multiple knowledge graphs than was previously possible.

Image for post
Image for post

The full paper can be read here.

Alibaba Tech

First hand, detailed, and in-depth information about Alibaba’s latest technology → Search “Alibaba Tech” on Facebook

Written by

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store