ClawFeed: How AI Curation Solves Information Overload for Busy Professionals

高效码农

2 months ago

ClawFeed: How AI Turns Information Overload into Curated Knowledge Streams

Core question: How can professionals stay informed without drowning in the daily flood of tweets, RSS feeds, and news alerts?

Information overload isn’t caused by too much news—it’s caused by too much noise. ClawFeed is an open-source AI-powered news aggregation system that doesn’t just move information from source to reader; it acts as a personal curator. By pulling from thousands of sources across Twitter, RSS, HackerNews, Reddit, and more, ClawFeed filters out the irrelevant and delivers structured summaries on schedules that match how you actually work: every 4 hours, daily, weekly, or monthly.

Why Traditional News Consumption Breaks Down

Core question: What is fundamentally wrong with how we currently consume news and social media?

The modern professional faces a paradox: accessing information has never been easier, but extracting value from it has never been harder. A typical knowledge worker might encounter hundreds of tweets, dozens of blog posts, and countless headlines each day. The traditional approach—subscribing to more sources—only amplifies the problem.

Consider the typical Twitter experience: thirty minutes of scrolling might yield three genuinely valuable insights, buried under marketing posts, hot takes, and outrage cycles. RSS readers present another challenge: they faithfully deliver every article from subscribed blogs, leaving the user to manually filter what matters. HackerNews and Reddit offer community-curated content, but the signal-to-noise ratio varies wildly by hour and by thread.

The root issue is that these tools treat all content equally. They don’t distinguish between a groundbreaking research announcement and a product marketing thread. They don’t recognize that a security vulnerability announcement needs immediate attention while a framework update can wait for the weekly review.

Author’s reflection: From “who’s talking” to “what matters”

I’ve observed that most people organize their information diet around sources rather than substance. They follow specific accounts or subscribe to specific blogs, assuming that loyalty to a source guarantees quality. But this confuses reputation with relevance. A renowned expert might post ten times about conference travel for every one technical insight. ClawFeed inverts this model: it defines what quality means first (through configurable rules), then finds it wherever it appears. This isn’t about automating laziness—it’s about reallocating cognitive resources from filtering to thinking. When you open your reader and see pre-digested insights rather than raw feeds, your mindset shifts from “catching up” to “engaging deeply.”

What ClawFeed Delivers: A Functional Overview

Core question: What specific capabilities does ClawFeed provide for managing information intake?

ClawFeed operates across three dimensions: multi-source aggregation, intelligent filtering, and structured output. Each dimension addresses a specific pain point in professional information consumption.

Time-Boxed Digest System

Unlike real-time notification systems that fragment attention, ClawFeed uses a time-boxing strategy with four digest frequencies:

Frequency	Best For	Content Characteristics
4-hour	Breaking news, security alerts, market movements	Immediate, high-priority items only
Daily	Industry monitoring, trend tracking	Balanced breadth and depth, ideal for morning review
Weekly	Synthesis, pattern recognition	Aggregated themes, eliminates ephemeral noise
Monthly	Strategic planning, long-term trends	Macro perspective, filters short-term volatility

This design matches cognitive rhythms rather than platform algorithms. The 4-hour digest suits security teams monitoring CVE announcements or traders tracking volatile markets. The daily digest fits commute reading. The weekly digest supports weekend deep-thinking sessions. You’re not expected to be “always on”—you’re empowered to engage at the right depth at the right time.

Comprehensive Source Support

ClawFeed natively supports nine source types, covering the channels technical professionals use most:

Source Type	Configuration Example	Typical Use Case
`twitter_feed`	`@karpathy`	Track specific thought leaders
`twitter_list`	List URL	Monitor thematic collections of accounts
`rss`	Any RSS/Atom URL	Follow independent blogs or news sites
`hackernews`	Front page or Show HN	Capture tech community priorities
`reddit`	`/r/MachineLearning`	Track niche community discussions
`github_trending`	`language=python`	Discover emerging open-source tools
`website`	Any URL	Scrape sites without RSS feeds
`digest_feed`	Another user’s slug	Subscribe to community-curated digests
`custom_api`	JSON endpoint	Integrate internal systems or proprietary data

Application scenario: Building a personal tech radar

Imagine you’re an independent developer focused on AI infrastructure. Here’s how you might configure ClawFeed:

High-frequency monitoring: Add Twitter feeds for @karpathy, @ylecun, and other key researchers with 4-hour digests to catch breaking papers or commentary immediately.
Community intelligence: Subscribe to HackerNews front page and /r/MachineLearning with daily digests to understand what the broader community finds valuable.
Project discovery: Configure github_trending sources for Python and Rust languages with weekly digests to identify tools worth investigating.
Deep research: Add RSS feeds from high-quality independent blogs with monthly digests, using the “Mark & Deep Dive” feature for longitudinal study.

Source Packs: Community-Powered Curation

Building a quality source list from scratch is time-intensive. ClawFeed introduces Source Packs—curated bundles of sources that users can share and install collectively. A “Machine Learning Researcher Starter Pack” might include twenty core Twitter accounts, ten conference blog RSS feeds, three essential Reddit communities, and default HackerNews and GitHub Trending configurations.

New users install these packs via a single API call, immediately inheriting community-validated information diets. This creates a network effect: experienced practitioners curate for newcomers, elevating the baseline quality of information across the user community.

Mark & Deep Dive: From Consumption to Research

Reading is only the beginning. ClawFeed provides a complete workflow for knowledge management:

Bookmarking: While reading digests, mark items of interest with a single action
Annotation: Add personal notes capturing immediate thoughts or questions
AI-powered analysis: Request expanded analysis of bookmarked content—summarization, technical extraction, cross-source comparison, or reading list generation

This transforms ClawFeed from a reader into a research assistant. When you’ve accumulated bookmarks on a specific topic, the system can analyze them collectively, revealing patterns invisible when items are viewed in isolation.

Architecture Choices: Design for Simplicity

Core question: How does ClawFeed’s technical architecture support its functional promises?

Understanding why ClawFeed is built the way it is helps evaluate whether it fits your context. The system follows three architectural principles: multi-tenancy, lightweight operation, and modular integration.

Storage: The SQLite Decision

Rather than deploying a distributed database cluster, ClawFeed uses SQLite. This choice reflects pragmatic engineering:

Zero configuration: No separate database server to install, manage, or monitor
Portability: The entire database is a single file, trivial to back up or migrate
Sufficient performance: For individual and small-team usage patterns, SQLite handles concurrency adequately
Minimal resource footprint: Runs comfortably on Raspberry Pi devices or low-cost VPS instances

Author’s reflection: The “good enough” principle in engineering

I’ve seen projects destroy themselves with premature optimization—introducing complex distributed systems to handle theoretical scale that never materializes. The operational burden of maintaining such architecture often exceeds the value it provides. ClawFeed’s SQLite choice represents honest engineering: it acknowledges that the primary use case is individuals and small teams, and it refuses to compromise present simplicity for hypothetical future scale. When scale genuinely becomes a constraint, migrating to PostgreSQL is straightforward. The cost of that future migration is far lower than the ongoing cost of over-engineering.

Deployment Flexibility: Four Integration Modes

ClawFeed offers four deployment patterns, each serving different user needs:

Mode 1: ClawHub (Managed)

clawhub install clawfeed

For users who want functionality without operational responsibility. ClawHub handles hosting, updates, and maintenance.

Mode 2: OpenClaw Skill

cd ~/.openclaw/skills/
git clone https://github.com/kevinho/clawfeed.git

OpenClaw is an AI agent framework. As a skill, ClawFeed becomes the agent’s information sense organ—cron jobs generate digests automatically, and the agent can query specific topics conversationally.

Mode 3: Zylos Skill

Identical integration pattern for the Zylos AI agent platform.

Mode 4: Standalone (Self-Hosted)

git clone https://github.com/kevinho/clawfeed.git
cd clawfeed
npm install

Full control for customization, privacy-sensitive deployments, or air-gapped environments.

Authentication: Graceful Degradation

Google OAuth support enables multi-user scenarios with personal bookmarks and source configurations. However, the application degrades elegantly: without OAuth configuration, it runs in read-only mode as a public information aggregation site. The same codebase serves both private personal use and public community services.

Installation and Configuration: From Zero to Operational

Core question: What are the exact steps to deploy ClawFeed on my own infrastructure?

The following procedures are derived directly from the source documentation and verified for accuracy.

Prerequisites

Ensure your system has:

Node.js (version 18 or higher recommended)
npm or yarn package manager
Git

Step 1: Obtain the Source

git clone https://github.com/kevinho/clawfeed.git
cd clawfeed

Step 2: Install Dependencies

npm install

Step 3: Environment Configuration

Copy the example configuration file:

cp .env.example .env

Edit .env with your specific values. Critical variables include:

Variable	Purpose	Required	Default
`GOOGLE_CLIENT_ID`	OAuth client identification	No*	—
`GOOGLE_CLIENT_SECRET`	OAuth client credentials	No*	—
`SESSION_SECRET`	Encryption key for sessions	No*	—
`API_KEY`	Authentication for digest creation	No	—
`DIGEST_PORT`	HTTP server port	No	8767
`ALLOWED_ORIGINS`	CORS permitted origins	No	localhost

*Required only for authentication features. Omit for read-only public deployment.

Step 4: Google OAuth Setup (Optional)

For multi-user authentication:

Navigate to Google Cloud Console
Create or select a project
Enable the Google+ API
Create OAuth 2.0 Client ID credentials
Add authorized redirect URI: https://yourdomain.com/api/auth/callback
Copy Client ID and Secret to .env

Step 5: Launch the Service

npm start

The API becomes available at http://127.0.0.1:8767.

For development with automatic reloading:

npm run dev

Step 6: Production Reverse Proxy

Example Caddy configuration:

handle /digest/api/* {
    uri strip_prefix /digest/api
    reverse_proxy localhost:8767
}

handle_path /digest/* {
    root * /path/to/clawfeed/web
    file_server
}

This routes API requests to the Node.js service while serving the SPA frontend statically.

Practical Operations: Sources to Feeds

Core question: How do I configure sources, generate digests, and publish them as subscribable feeds?

Scenario 1: Monitoring Specific Twitter Accounts

To track a thought leader like Andrej Karpathy:

Identify the Twitter handle: @karpathy
Create the source via API:

curl -X POST http://localhost:8767/api/sources \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "name": "Karpathy Feed",
    "type": "twitter_feed",
    "config": {
      "username": "karpathy"
    }
  }'

Verify configuration using the detection endpoint:

curl "http://localhost:8767/api/sources/detect?url=https://twitter.com/karpathy"

Scenario 2: Migrating Existing RSS Subscriptions

Batch import from a list of URLs:

# Assuming rss_list.txt contains one URL per line
while read url; do
  curl -X POST http://localhost:8767/api/sources \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d "{
      \"name\": \"Imported Feed\",
      \"type\": \"rss\",
      \"config\": {
        \"url\": \"$url\"
      }
    }"
done < rss_list.txt

Scenario 3: Creating and Retrieving Digests

Authenticated digest creation:

curl -X POST http://localhost:8767/api/digests \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "type": "daily",
    "sources": ["source_id_1", "source_id_2"]
  }'

Public digest retrieval (no authentication required):

# List recent daily digests, paginated
curl "http://localhost:8767/api/digests?type=daily&limit=20&offset=0"

# Retrieve specific digest details
curl http://localhost:8767/api/digests/:id

Scenario 4: Publishing Subscribable Feeds

ClawFeed exports digests in standard formats for integration with other tools:

Format	Endpoint Pattern	Use Case
HTML	`/feed/:slug`	Direct browser consumption
JSON Feed	`/feed/:slug.json`	Modern reader integration
RSS 2.0	`/feed/:slug.rss`	Legacy reader compatibility

Example: A user with slug tech-lead-digest publishes at https://yourdomain.com/feed/tech-lead-digest.rss, which team members subscribe to in their preferred readers.

Application scenario: Team knowledge distribution

In a technical team, designate one member as information curator:

Curator maintains high-quality source combinations (Source Packs)
Daily digests automatically publish to /feed/team-engineering-daily.rss
Team members subscribe individually—no need for each person to maintain source lists
The team shares a common, filtered information baseline

Scenario 5: Bookmark Management and Analysis

Creating a bookmark with metadata:

curl -X POST http://localhost:8767/api/marks \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{
    "url": "https://example.com/article",
    "title": "Architecture Decision Record",
    "note": "Novel approach to service mesh optimization, requires deeper analysis"
  }'

Retrieving personal bookmarks:

curl http://localhost:8767/api/marks \
  -H "Authorization: Bearer YOUR_TOKEN"

Removing a bookmark:

curl -X DELETE http://localhost:8767/api/marks/:id \
  -H "Authorization: Bearer YOUR_TOKEN"

Customization: Teaching AI Your Preferences

Core question: How can I adjust AI filtering logic and output formatting?

ClawFeed’s AI behavior is controlled through two editable template files.

Content Curation Rules

Edit templates/curation-rules.md to define what constitutes valuable content:

# Curation Rules

## Exclusion Criteria
- Marketing content: Contains "limited time," "click here," or promotional language
- Pure opinion: Emotional commentary without technical substance or data
- Duplicates: Content with >80% similarity to already-included items

## Inclusion Priorities
- Technical depth: Includes code examples, architecture diagrams, or performance benchmarks
- Timeliness: Addresses recent releases, security patches, or breaking changes
- Community validation: High engagement on HackerNews or relevant subreddits

## Scoring Dimensions
1. Information density (1-5): Unique insights per unit length
2. Actionability (1-5): Can readers act on this information?
3. Source authority (1-5): Recognized expertise in the domain

Output Format Templates

Edit templates/digest-prompt.md to customize presentation structure. Possibilities include:

Requiring “one-sentence summary,” “key points,” and “why this matters” for each entry
Grouping by topic rather than chronological order
Adding an “executive overview” paragraph at digest start

Author’s reflection: Explicit curation as self-knowledge

Writing curation rules forces articulation of what you value. When you specify “exclude marketing content,” you’re defining what deserves your attention. When you prioritize “technical depth,” you’re acknowledging that surface-level coverage wastes your time. Most people never examine their information diet this explicitly—they complain about overload without defining what “quality” means. ClawFeed’s template system makes this introspection unavoidable and, ultimately, empowering.

User Interface and Experience

Core question: What frontend capabilities does ClawFeed provide?

The single-page application dashboard offers:

Theme Support

Dark mode: Reduced eye strain for evening reading
Light mode: Professional appearance for office environments
Persistent preference: Stored in browser localStorage

Internationalization

Native English and Chinese interface support, with automatic browser language detection and manual override options.

Responsive Design

Adaptive layouts for desktop and mobile browsers, supporting digest reading and bookmark management on phones and tablets.

Real-Time Preview

When configuring sources, preview content samples to verify correct setup before committing.

Image source: Dribbble – News aggregation dashboard design concept

Development, Testing, and Contribution

Core question: How can I modify ClawFeed or contribute to its development?

Development Workflow

npm run dev

Enables file watching with automatic server restart on changes.

Testing Infrastructure

Complete end-to-end test suite:

cd test
./setup.sh      # Initialize test users and data
./e2e.sh        # Execute 66 end-to-end test cases
./teardown.sh   # Clean test artifacts

Architecture Documentation

Comprehensive design documentation in docs/ARCHITECTURE.md covers:

Multi-tenant data isolation strategies
Scalability analysis and constraints
Security considerations and threat modeling

Contribution Process

Fork the repository
Create feature branch: git checkout -b feature/description
Commit changes: git commit -m 'Add specific functionality'
Push to fork: git push origin feature/description
Open pull request for review

Action Checklist / Implementation Steps

[ ] Verify Node.js version 18+ installed
[ ] Clone repository and install npm dependencies
[ ] Copy .env.example to .env and configure variables
[ ] Decide on authentication model (OAuth multi-user vs. read-only public)
[ ] Start service and verify at http://localhost:8767
[ ] Create initial source (RSS or Twitter recommended for testing)
[ ] Generate first digest and validate output quality
[ ] Configure production reverse proxy
[ ] Set up automated digest generation via cron if needed
[ ] Customize curation rules and output templates to match preferences

One-Page Overview

Aspect	Summary
Purpose	AI-powered news aggregation and summarization
Core Functions	Multi-source aggregation, intelligent filtering, scheduled digests, bookmark management
Source Types	Twitter, RSS, HackerNews, Reddit, GitHub Trending, websites, custom APIs (9 total)
Deployment Options	ClawHub managed, OpenClaw Skill, Zylos Skill, standalone self-hosted
Technology	Node.js runtime, SQLite database, SPA frontend
Authentication	Google OAuth (optional, falls back to read-only mode)
Output Formats	HTML, JSON Feed, RSS 2.0
Customization	Editable filtering rules and output format templates

Frequently Asked Questions

Q1: How does ClawFeed differ from traditional RSS readers like Feedly?

Traditional readers use an “inbox” model—all subscribed content appears chronologically for manual filtering. ClawFeed uses a “curation” model: AI applies your rules to pre-filter and summarize content. You receive processed knowledge products rather than raw feeds.

Q2: Will AI summarization miss critical details?

This trade-off is addressed through the “Mark & Deep Dive” workflow. Digests serve for rapid scanning and discovery; when you find valuable items, bookmark them for full AI analysis. Use digests as radar, not as complete reading.

Q3: Can I use ClawFeed for specific functions only, like bookmarking?

Yes. The modular design supports partial usage. Disable AI summarization by editing templates if you only want aggregation, or use only the bookmarking features as a read-later service with AI analysis.

Q4: Where is data stored?

Standalone deployments store all data (SQLite database) locally on your server, maintaining complete privacy. ClawHub hosted versions store data on ClawHub infrastructure. Skill modes depend on the host platform’s architecture.

Q5: How do I ensure AI filtering accuracy?

Accuracy depends on your curation-rules.md configuration. Start with permissive rules, observe several digest cycles, then tighten based on observed output. Different sources can have different rule strictness—strict for technical blogs, permissive for news feeds.

Q6: What languages does ClawFeed support?

The interface supports English and Chinese. AI summaries can process content in any language, but output language depends on your digest-prompt.md template configuration. Modify templates to request Chinese, English, or other language outputs.

Q7: Can ClawFeed integrate with Slack, Discord, or email?

Current versions provide RSS/JSON Feed outputs, which integrate via automation platforms like Zapier, IFTTT, or n8n. Native integrations are planned for future releases.

Q8: What is the licensing model?

ClawFeed is released under the MIT License, permitting free use, modification, and commercial deployment with attribution.

Conclusion

Effective information management means receiving the right information at the right time. ClawFeed doesn’t help you read more—it helps you read better. When you replace two hundred daily tweets with ten curated insights, you reclaim not just time but decision quality. Good decisions require high-quality inputs, not high-frequency stimulation.

In an era of tool abundance, choosing a simple, controllable information workflow that matches your cognitive patterns may matter more than adopting the latest technology.