From Zero to Hero: Shaking Up the Field of Zero-shot Learning

This article is part of the Academic Alibaba series and is taken from the paper entitled “Transductive Unbiased Embedding for Zero-Shot Learning” by Jie Song, Chengchao Shen, and Mingli Song, Yezhou Yang and Yang Liu. The full paper can be read here.

Image for post
Image for post

Alibaba and its research partners are shaking up the field of zero-shot learning (ZSL) with its novel quasi-fully supervised learning (QFSL) model, with tests indicating a considerable outperformance of other models. In machine learning, zero-shot learning refers to the process by which a machine learns how to recognize objects in an image without any labeled training data to help in the classification. In other words, ZSL aims to help machines categorize objects that they have never seen before. Naturally, this poses a huge challenge for developers. Imagine, for example, trying to identify a snake in a photo without ever having seen one before. While this might seem an impossible task, if the machine is fed a detailed description of a snake — long, legless, scaly — then it is capable of quickly and accurately recognizing the object. Essentially, this is how ZSL operates.

Avoiding Bias

Image for post
Image for post
Bias towards source classes in a semantic embedding space

When there are few training images available, or indeed none, existing object recognition models struggle to make correct predictions, and ZSL was developed principally as a means to combat this growing problem.

QFSL — New Solution

Image for post
Image for post
Overall architecture of the QFSL model

Most ZSL methods map input images to fixed anchor points in the embedding space during training, but the QFSL method also allows mapping between the input and other points. The labeled source data is projected to the points specified by the source class in the shared semantic space, building a relationship between the visual and semantic embeddings. Meanwhile, the unlabeled target data is projected to other points, helping to alleviate the problem of bias.

QFSL owes its name to its similarities with conventional fully-supervised classification, in which a multi-layer neural network and classifier are integrated together. In the training phase, QFSL recognizes data from both source and target classes, even if there is no labeled data for the target class. This feature is advantageous, as any available labeled data of a target class can be used in the future to train the model.

Looking Ahead

Image for post
Image for post
Image for post
Image for post
The best result is marked in bold, with second best in blue: QFSL clearly outperforms other methods

These promising results leave the door open for further research, and Alibaba and its research partners are investigating how other aspects of the semantic space, such as word vectors, can be exploited to influence results. Inductive ZSL is another research line to consider, to see whether it can solve the same problems as transductive ZSL.

Image for post
Image for post

The full paper can be read here.

Alibaba Tech

Written by

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store