Fake News Detector: Building an AI-Powered Fact-Checking System

App Screenshot

Why Do We Need Fake News Detection?

Have you ever come across news that felt a little too dramatic?

  • You sense something is off but can’t pinpoint it.
  • You try to verify it, but it takes too much time and effort.
  • A few days later, you realize it was completely fake.

That’s the danger of fake news.

  • It wastes attention and time.
  • It shapes public opinion and sometimes even influences policy or markets.

So here’s the big question: Can AI help us fact-check news automatically?
Yes — and that’s exactly what this project is about.

Meet Fake News Detector, an AI-powered system that extracts key claims from news articles, searches for supporting evidence online, analyzes semantic relevance, and finally tells you whether the claim is true, false, or partially true.
The best part? It comes with a Streamlit web interface that runs locally with a single command.


Project Structure: What’s Inside the Codebase?

A common question from developers is:

“How is the system organized under the hood?”

Here’s the project layout:

fake-news-detector/
│
├── app.py                 # Main entry point (Streamlit app)
├── fact_checker.py        # Core fact-checking logic
├── auth.py                # User authentication
├── db_utils.py            # Database utilities
├── pdf_export.py          # Generate PDF reports
├── requirements.txt       # Dependencies
├── api.py                 # REST API endpoints
├── gunicorn.conf.py       # Gunicorn configuration
├── start_server.sh        # API startup script
├── .gitignore             # Git ignore rules
├── LICENSE                # Open-source license
├── README.md              # Documentation
│
├── test/                  # Tests
│   ├── api_test_page.html # API testing page
└── docs/                  # Documentation
    ├── images/            # Images for docs
    │   └── screenshot.png # App screenshot
    ├── api_doc.html       # API docs
    └── usage.md           # Detailed usage guide

At a glance:

  • app.py → the entry point, launching the Streamlit interface.
  • fact_checker.py → the “brain” where fact-checking happens.
  • auth.py → handles user authentication.
  • db_utils.py → utility functions for database interactions.
  • pdf_export.py → exports fact-checking results as PDF reports.
  • api.py → makes it easy to integrate with other systems.

Core Features: What Makes It Different?

Unlike a simple web scraper or search tool, Fake News Detector comes with some unique advantages:

  • 🔍 Automatic claim extraction → The system identifies verifiable claims directly from news text.
  • 🌐 Real-time web search → Uses DuckDuckGo to find supporting evidence.
  • 🤖 Semantic matching → Employs BGE-M3 embeddings to measure similarity between claims and evidence.
  • 📊 Evidence chunking → Long articles are split into chunks, making retrieval more accurate.
  • Fact-checking results → Provides clear verdicts: true, false, or partially true.
  • 🔄 Streaming interface → Displays the fact-checking process step by step in real time.

In short: it doesn’t just return a list of links. It gives you a verdict with evidence.


Quick Start Guide

Worried this might be complicated to set up? Don’t be.

1. Prerequisites

  • Python 3.12
  • Qwen2.5 model (or any LLM compatible with the OpenAI API)
  • BGE-M3 embedding model (local or via API)

2. Installation

# Clone repository
git clone https://github.com/yourusername/fake-news-detector.git
cd fake-news-detector

# Install dependencies
pip install -r requirements.txt

3. Configure the embedding model

In fact_checker.py, set the model path:

self.embedding_model = BGEM3FlagModel('/path/to/your/bge-m3/')

4. Launch the app

streamlit run app.py

Then open your browser at:
👉 http://localhost:8501


How to Use the App

Here’s what the workflow looks like:

  1. Input news text

    • Paste the article or headline into the text box.
  2. System workflow (automatic)

    • Extracts verifiable claims
    • Searches for evidence
    • Ranks evidence by semantic similarity
    • Generates a fact-check verdict
  3. Output results

    • Verdict: True, False, or Partially True
    • Supporting evidence snippets
    • Reasoning process

💡 Example:
Input: “Aliens discovered in City X.”

  • Claim extraction → “Aliens discovered in City X”
  • Web search → Finds scientific and local news reports
  • Semantic matching → No evidence supports the claim
  • Verdict → False

System Architecture: Step-by-Step Pipeline

Fake News Detector follows a pipeline architecture:

  1. Claim Extraction → LLM extracts fact-checkable claims.
  2. Search Stage → Queries DuckDuckGo for evidence.
  3. Relevance Scoring → BGE-M3 embeddings compute similarity.
  4. Evidence Processing → Splits long text and selects key passages.
  5. Judgment Stage → Outputs verdict with reasoning.

📊 Here’s a visual flowchart:

flowchart TD
    A[Input news text] --> B[Extract claims]
    B --> C[Search for evidence]
    C --> D[Semantic similarity scoring]
    D --> E[Chunk & process evidence]
    E --> F[Final verdict: True/False/Partial]
    F --> G[Display verdict + reasoning]

Tech Stack Overview

Component Technology Purpose
Frontend Streamlit Interactive web interface
LLM Qwen2.5-14B Claim extraction & reasoning
Embedding Model BGE-M3 Semantic similarity scoring
Search Engine DuckDuckGo API Evidence retrieval
Utilities NumPy, OpenAI-compatible API Data processing

This stack strikes a balance between accuracy and efficiency, while keeping everything developer-friendly.


FAQ: Common Questions

❓ Does it support non-English news?

Yes. Qwen2.5 is multilingual, and BGE-M3 embeddings support multiple languages, including Chinese.

❓ Can I run it on a server?

Absolutely. With gunicorn.conf.py and start_server.sh, you can deploy it as an API service.

❓ Does it store my queries?

By default, results can be saved via db_utils.py. You can also disable storage if privacy is a concern.

❓ Can I export fact-checking reports?

Yes. Use pdf_export.py to generate shareable PDF reports.

❓ How is this different from asking ChatGPT “Is this news true?”

Great question. Unlike ChatGPT, which may just give an answer, Fake News Detector provides an evidence chain and reasoning process, making it more transparent and verifiable.


How to Contribute

Since this project is open-sourced under MIT License, contributions are welcome:

  1. Fork the repo
  2. Create a new branch: git checkout -b feature/xxx
  3. Commit changes: git commit -m 'Add xxx feature'
  4. Push branch: git push origin feature/xxx
  5. Open a Pull Request

Future improvements could include:

  • More search engines (Google/Bing integration)
  • Multi-modal fact-checking (images, videos)
  • Improved UI/UX for end users
  • Knowledge graph integration for stronger reasoning

Project Links


Final Thoughts

Fake News Detector is not just another AI toy project — it’s a practical tool for:

  • Journalists and researchers verifying claims
  • Developers experimenting with LLM-based fact-checking
  • Anyone curious about applying AI to real-world misinformation problems

Its strengths lie in being automated, explainable, and extendable.
Instead of leaving you guessing, it shows both the verdict and the reasoning behind it.

If you’re passionate about fighting misinformation, this project is a great place to start. 🚀