Like AI Cares: Helping Computers Detect Sarcasm

This article is part of the Academic Alibaba series and is taken from the WWW 2019 paper entitled “Sarcasm Detection with Self-matching Networks and Low-rank Bilinear Pooling” by Tao Xiong, Hongbo Zhu, Peiran Zhang, and Yihui Yang. The full paper can be read here.

How can you tell when someone is being sarcastic online? When you can’t rely on facial expressions, tone of voice, and other contextual information, identifying sarcasm can be difficult. If even humans struggle with this, imagine how challenging it is for computers. Accurate computer analysis of human language is impossible without knowing which expressions to take at face value. And considering how common sarcasm has become in online communication, computational linguists have their work cut out for them.

Conflicting Sentiments

I had the pleasure of being awakened by my neighbor’s screaming cockatoo.

“Pleasure” is a positive expression, but being awakened by a loud noise is something people generally don’t enjoy.

However, simply identifying conflicting sentiments together in a sentence isn’t enough to prove that the sentence is sarcastic. The composition of the sentence is also a key factor. For example, this sentence also contains a positive description and a negative concept, but obviously isn’t sarcastic:

I like all birds except my neighbor’s screaming cockatoo.

Putting It All Together

The incongruity information comes from an innovative “self-matching network.” This network analyzes every word-to-word pair in the input sentence, searching for potential conflicting sentiments.

The compositional information comes from a bidirectional long short-term memory (LSTM) network, which is especially suited for analyzing sentence composition.

Because the self-matching network also ends up containing some compositional information, directly combining the two networks is likely to result in redundancies. For this reason, the two networks are concatenated using a pooling method that controls for potential redundant information without compromising the discriminative power of the self-matching network.

Sarcastic sentences reported by the self-matching network

The model’s performance is impressive. Tests run on several publicly available datasets from sources such as Twitter and Reddit have demonstrated that this model outperforms existing baselines in precision, recall, F1 score (i.e., false positives and false negatives), and accuracy — in some cases, by as much as 10%.

All Done? Yeah, Right.

Two false negatives and a false positive reported by the model

The Long Road Ahead

Great job, geniuses — and we mean that in the sincerest way possible.

The full paper can be read here.

Alibaba Tech

First-hand & in-depth information about Alibaba's tech innovation in Artificial Intelligence, Big Data & Computer Engineering. Follow us on Facebook!