Fake News Detector: Building an AI-Powered Fact-Checking System

Why Do We Need Fake News Detection?

Have you ever come across news that felt a little too dramatic?

You sense something is off but can’t pinpoint it.
You try to verify it, but it takes too much time and effort.
A few days later, you realize it was completely fake.

That’s the danger of fake news.

It wastes attention and time.
It shapes public opinion and sometimes even influences policy or markets.

So here’s the big question: Can AI help us fact-check news automatically?
Yes — and that’s exactly what this project is about.

Meet Fake News Detector, an AI-powered system that extracts key claims from news articles, searches for supporting evidence online, analyzes semantic relevance, and finally tells you whether the claim is true, false, or partially true.
The best part? It comes with a Streamlit web interface that runs locally with a single command.

Project Structure: What’s Inside the Codebase?

A common question from developers is:

“How is the system organized under the hood?”

Here’s the project layout:

fake-news-detector/
│
├── app.py                 # Main entry point (Streamlit app)
├── fact_checker.py        # Core fact-checking logic
├── auth.py                # User authentication
├── db_utils.py            # Database utilities
├── pdf_export.py          # Generate PDF reports
├── requirements.txt       # Dependencies
├── api.py                 # REST API endpoints
├── gunicorn.conf.py       # Gunicorn configuration
├── start_server.sh        # API startup script
├── .gitignore             # Git ignore rules
├── LICENSE                # Open-source license
├── README.md              # Documentation
│
├── test/                  # Tests
│   ├── api_test_page.html # API testing page
└── docs/                  # Documentation
    ├── images/            # Images for docs
    │   └── screenshot.png # App screenshot
    ├── api_doc.html       # API docs
    └── usage.md           # Detailed usage guide

At a glance:

app.py → the entry point, launching the Streamlit interface.
fact_checker.py → the “brain” where fact-checking happens.
auth.py → handles user authentication.
db_utils.py → utility functions for database interactions.
pdf_export.py → exports fact-checking results as PDF reports.
api.py → makes it easy to integrate with other systems.

Core Features: What Makes It Different?

Unlike a simple web scraper or search tool, Fake News Detector comes with some unique advantages:

🔍 Automatic claim extraction → The system identifies verifiable claims directly from news text.
🌐 Real-time web search → Uses DuckDuckGo to find supporting evidence.
🤖 Semantic matching → Employs BGE-M3 embeddings to measure similarity between claims and evidence.
📊 Evidence chunking → Long articles are split into chunks, making retrieval more accurate.
✅ Fact-checking results → Provides clear verdicts: true, false, or partially true.
🔄 Streaming interface → Displays the fact-checking process step by step in real time.

In short: it doesn’t just return a list of links. It gives you a verdict with evidence.

Quick Start Guide

Worried this might be complicated to set up? Don’t be.

1. Prerequisites

Python 3.12
Qwen2.5 model (or any LLM compatible with the OpenAI API)
BGE-M3 embedding model (local or via API)

2. Installation

# Clone repository
git clone https://github.com/yourusername/fake-news-detector.git
cd fake-news-detector

# Install dependencies
pip install -r requirements.txt

3. Configure the embedding model

In fact_checker.py, set the model path:

self.embedding_model = BGEM3FlagModel('/path/to/your/bge-m3/')

4. Launch the app

streamlit run app.py

Then open your browser at:
👉 http://localhost:8501

How to Use the App

Here’s what the workflow looks like:

Input news text
- Paste the article or headline into the text box.
System workflow (automatic)
- Extracts verifiable claims
- Searches for evidence
- Ranks evidence by semantic similarity
- Generates a fact-check verdict
Output results
- Verdict: True, False, or Partially True
- Supporting evidence snippets
- Reasoning process

💡 Example:
Input: “Aliens discovered in City X.”

Claim extraction → “Aliens discovered in City X”
Web search → Finds scientific and local news reports
Semantic matching → No evidence supports the claim
Verdict → False

System Architecture: Step-by-Step Pipeline

Fake News Detector follows a pipeline architecture:

Claim Extraction → LLM extracts fact-checkable claims.
Search Stage → Queries DuckDuckGo for evidence.
Relevance Scoring → BGE-M3 embeddings compute similarity.
Evidence Processing → Splits long text and selects key passages.
Judgment Stage → Outputs verdict with reasoning.

📊 Here’s a visual flowchart:

flowchart TD
    A[Input news text] --> B[Extract claims]
    B --> C[Search for evidence]
    C --> D[Semantic similarity scoring]
    D --> E[Chunk & process evidence]
    E --> F[Final verdict: True/False/Partial]
    F --> G[Display verdict + reasoning]

Tech Stack Overview

Component	Technology	Purpose
Frontend	Streamlit	Interactive web interface
LLM	Qwen2.5-14B	Claim extraction & reasoning
Embedding Model	BGE-M3	Semantic similarity scoring
Search Engine	DuckDuckGo API	Evidence retrieval
Utilities	NumPy, OpenAI-compatible API	Data processing

This stack strikes a balance between accuracy and efficiency, while keeping everything developer-friendly.

FAQ: Common Questions

❓ Does it support non-English news?

Yes. Qwen2.5 is multilingual, and BGE-M3 embeddings support multiple languages, including Chinese.

❓ Can I run it on a server?

Absolutely. With gunicorn.conf.py and start_server.sh, you can deploy it as an API service.

❓ Does it store my queries?

By default, results can be saved via db_utils.py. You can also disable storage if privacy is a concern.

❓ Can I export fact-checking reports?

Yes. Use pdf_export.py to generate shareable PDF reports.

❓ How is this different from asking ChatGPT “Is this news true?”

Great question. Unlike ChatGPT, which may just give an answer, Fake News Detector provides an evidence chain and reasoning process, making it more transparent and verifiable.

How to Contribute

Since this project is open-sourced under MIT License, contributions are welcome:

Fork the repo
Create a new branch: git checkout -b feature/xxx
Commit changes: git commit -m 'Add xxx feature'
Push branch: git push origin feature/xxx
Open a Pull Request

Future improvements could include:

More search engines (Google/Bing integration)
Multi-modal fact-checking (images, videos)
Improved UI/UX for end users
Knowledge graph integration for stronger reasoning

Project Links

GitHub: https://github.com/CaptainYifei/fake-news-detector
Gitee: https://gitee.com/love2eat/fake-news-detector

Final Thoughts

Fake News Detector is not just another AI toy project — it’s a practical tool for:

Journalists and researchers verifying claims
Developers experimenting with LLM-based fact-checking
Anyone curious about applying AI to real-world misinformation problems

Its strengths lie in being automated, explainable, and extendable.
Instead of leaving you guessing, it shows both the verdict and the reasoning behind it.

If you’re passionate about fighting misinformation, this project is a great place to start. 🚀

Fake News Detector: How AI-Powered Fact-Checking Combats Misinformation