Edited at

KDD2019 チュートリアルT11: Fake News Research: Theories, Detection Strategies, and Open Problems



T11: Fake News Research: Theories, Detection Strategies, and Open Problems

Reza Zafarani, Xinyi Zhou, Kai Shu, Huan Liu

Tutorial page: https://www.fake-news-tutorial.com/

Slide: https://docs.wixstatic.com/ugd/f31d05_9a90deb47a04427d98d35444f5b6fe45.pdf

(Reza Zafaran)


  • Research backgraound

  • What is fake news

  • Related concepts

  • Fundamental theories.

Research background

fake news is now viewed as one of the greatest threats to democracy, justice, public trust, freedom of expression, journalism, Economy.

Political aspects :may have had as impact on brexirs referendum

2016 US presidencial aspect

Economic aspect.: Barak onbama was injured in an explosion wiped out $130 billion in stock value.

Social /psycological Aspects

For fake news, it is relatively easier to obtain public trust.

Validity effect

Confirmation bias

Peer pressure

Why is more attracting more public attention recently

created Faster and cheaper

The rise of social media

social media accelerates dissemination of fake news.

What is fake news , Related concepts

Fake news is intentionally and verifiably

Fake news

Authenticity : false , intention :bad, News: Yes

False news

Authenticity : false , intention :unknown, News: Yes

Fundamental theories.

Fundamental human cognition and behaivior theories developed across various disciplines such as philothophy, social science, and economic s provides invaluables insight for fake news studies .

Style based fundamental theories.

Propagation-based fundamental theories.: studying fake news based on how It spreads.

"Fake news is incorrect but hard to correct”

User-based fundamental theories.: studying fake news from perspective of user: how users engage with (or can play)

Fake news detection (outline)

  • Knowledge based Fake news detection

  • Style based Fake news detection

  • Propagation -based Fake news detection

  • Credibility based Fake news detection

  • Fake-news datasets & tools

Knowledge based Fake news detection

Its is also known as fact-checking


Expert-based manual fact-checkning

Fact-checkers: one or several domain-experts,

Crowd-sourced manual fact-checking

Fact-checkers: large-individual

Politifact, the Washington post fact checker , fact check scopes truthfiction, fullfaxt, hoxxstlayer

Expert-based manual fact-checking

Croud-sourced manual fact-checking

Automatic fact-checking

How to represent “knowledge”

Stage1: fact extraction

T1: Entity resolution (duplication/record linkage)

T2: Time recording to remove outdated knowledge

T3: knowledge fusion to handle conflicts (often in open-source knowledge extraction)

T4: credibility

T5: knowledge inference/Link prediction to infer news facts based on knowns ones

Relation machine learning: latent feature models,,, malcof random field

Stage2: fact checking: comparing knowledge between articles and knowledge graphs.

Knowledge inference for unknown SPO triples: illustrated studies.

Shortest path based method

discriminative path-based method

Knowledge inference

(Xinyi Zhou)

Fake news detection

Fake news :

  • A survey of research

  • Detection methods

  • oppotunities

Style-based fake news detection

The good

It can detect fake news before propagation

It can detect “real” fake news...

The way to detect

Style representaion

Style classification

Traditional ML : SVM, RF, XGboost

DL Framework

  • Multi-modal

  • Explainable representative

  • performance

Fake news early detection : A theory-driven model

  • Interpretability

  • Empirical relations

    Writing style

    Level: lexicon, feature: BOWs

    Level: syntax, feature: POS tags CFGs,

    Level: discourse, feature: POS tags RRs

    Frequency: absolute? Standardized ? Relatives by using TF-IDF.

  • Multi-modal

  • Event-invariant

    Input: imagem text

    decoder: Fake-news detector Event-discrimitor

Propagation-based Fake News Detection

The challenges

  • Algorism transparency : writing style can be manipulated

  • Golden datasets with reliable lables: multi-label , domain lanuguage,

  • Diffrent types of fake news:

  • Model explain-ability

The good: Massive auxiliary information can be utilized for comprehensive detecting

News cascade

Homogenous Network

Stance Network

Credibility based Fake news detection

Headline Credibility & Clickbait detection

User credibility & Bot detection

  • Low > Malicious users

  • User credibility score > susceptible users: Unintentionally engage in fake news activity

  • High > insusceptible users: immure to fake news

The challenges

  • Fake news early detection

  • Empirical relationship between fake news and clickbait

  • Assessing user intention in fake news activities

    (Kai Shu)

    Beyond News Contents: The Role of Social Context for Fake News Detection

    Fake News Detection – Multi-Source : A typical news dissemination system on social media

  • Entity: publishers, news, social engagement.

    Tri-Relationship Embedding (TriFN)

  • News contents embedding

  • Social contexts embedding

    we jointly combine news content embedding and social context embedding for fake news detection

Datasets: FakeNewsNet with information for news contents, social context and ground truth labels from fact-checking websites


  • Social context information brings additional signals to fake news detection

  • It is important to capture the relations among publishers, news pieces, and users to detect fake news

  • The proposed TriFN framework is effective to model tri-relationships through heterogeneous network embedding

Unsupervised Fake News Detection: A Generative Approach

Unsupervised Fake News Detection

- news detection method by modeling user opinions and user credibility


The hierarchical user engagement structure: We build a hierarchical user engagement structure for each news

Deep Headline Generation for Clickbait Detection


Existing approaches: extracting hand-crafted linguistic features or building sophisticated predictive models such as deep neural networks


Scale: datasets with labels are often limited

Distribution: imbalanced distribution of clickbaits and non-clickbaits

Headline Generation from Documents

Goal: Generate stylized headlines that also preserve document contents

Model: Generator Learning: a document autoencoder , a headline generator

Discriminator Learning: a transfer discriminator , a style discriminator , a pair discriminator


We study the problem of generating clickbaits/nonclickbaits from original documents for clickbait detection

We propose a novel deep generative model with adversarial learning

Fake News Datasets & Tools

Data repository: FakeNewsNet, [Github], [Kaggle], [Paper]


KDD2019 Poster session

dEFEND: Explainable Fake News Detection

A new framework for the Novel problem of explainable fake news detection

Achieve higher saccuracy than the state-of-the-art fake news detection method

Discover explainable news sentences and user comments to understand