★Building a Privacy-First AI Assistant for WeChat Information Management★

Core question: How can you automatically extract todos, schedules, and valuable content from WeChat conversations while keeping sensitive data local?

WeChat has become the central nervous system for work, learning, and social coordination in many regions. Every day, countless action items, meeting arrangements, and valuable insights flow through chat windows—yet most of this information evaporates into the stream of new messages. This article presents a macOS-based automation system that uses an AI assistant to extract structured information from WeChat chats and push it to Discord for centralized management. The entire data extraction pipeline runs locally, ensuring that your private conversations never leave your machine unencrypted.

Image source: Unsplash

Why Automate WeChat Information Management?

Core question: What pain points does manual WeChat organization create?

Consider a typical workday: you’re coordinating across three project groups, a friend shared five valuable articles in private messages, and colleagues scheduled three meetings. By day’s end, you face hundreds of messages containing scattered tasks, fragmented meeting details, and reference materials buried in chat history.

The cognitive load is substantial. More critically, WeChat’s search functionality—while powerful—lacks semantic understanding. It matches keywords but cannot recognize that “let’s meet tomorrow afternoon at three” is a scheduling event, or distinguish casual discussion from actionable tasks requiring follow-up.

The value of automation lies in letting machines handle information triage and structuring, freeing human attention for decision-making and execution.

System Architecture: Local Extraction Plus AI Analysis

Core question: How does this solution balance efficiency with privacy protection?

The system employs a two-layer architecture: local data extraction followed by cloud-based AI analysis. This design preserves processing speed while ensuring sensitive information never reaches external services in raw form.

Architecture Overview

Phase One: Local Data Extraction (CLI Tools)

🍄

Extract database decryption keys from WeChat process memory
🍄

Decrypt SQLCipher 4-encrypted local databases
🍄

Incrementally sync messages to a local SQLite database
🍄

Output structured JSON files (todos, calendar events, content digests)

Phase Two: AI-Powered Analysis (Prompt-Driven)

🍄

Read extracted JSON data
🍄

Process through large language model APIs (Claude, GPT, etc.)
🍄

Classify and organize according to customizable templates
🍄

Push results to designated Discord channels

Architecture overview: CLI handles data extraction while prompt templates drive AI analysis

Privacy Protection Mechanisms

Author’s reflection: Why insist on local extraction?

When designing this system, I repeatedly weighed cloud processing against local processing. WeChat databases contain sensitive business discussions, personal matters, and unconfirmed decisions. Uploading raw data to cloud analysis services—even with privacy promises—introduces data breach risks and compliance concerns.

The solution establishes clear data boundaries:

🍄

Original encrypted databases: Never leave the local machine
🍄

Decrypted chat summaries: Only structured text extracts are sent to AI APIs
🍄

Key extraction: Requires sudo privileges but operates only in local memory

This design makes privacy risks manageable. You know precisely what data leaves your device, and that content is already summarized rather than raw conversation.

Core Capabilities: Three Types of Automated Extraction

Core question: What information types can the system automatically identify?

The system targets three common information management scenarios in daily work and learning.

Private Chat Todo Extraction

Application scenario: One-on-one conversations with colleagues, friends, or family frequently contain requests like “please handle this” or “don’t forget to do that.” These tasks scatter across multiple chats and are easily overlooked.

Extraction logic:

🍄

Scan private messages for action-oriented vocabulary (“need to,” “remember to,” “don’t forget,” etc.)
🍄

Identify task subjects, deadlines, and priority indicators
🍄

Output contextualized todos with source references

Operational example:
You receive this message:

“

“Remember to send the report to Director Zhang before tomorrow afternoon, and also confirm whether Wednesday’s meeting time changed to 2 PM.”

The system extracts:

{
  "todos": [
    {
      "task": "Send report to Director Zhang",
      "deadline": "Tomorrow afternoon",
      "context": "From conversation with [Colleague A]",
      "priority": "high"
    },
    {
      "task": "Confirm if Wednesday meeting moved to 2 PM",
      "deadline": "Tomorrow afternoon",
      "context": "From conversation with [Colleague A]",
      "priority": "medium"
    }
  ]
}

Schedule Extraction from Conversations

Application scenario: Group or private chats where people coordinate meeting times, arrange meetups, or plan events. These messages typically contain time, location, participants, and topic—though expressed flexibly.

Extraction logic:

🍄

Recognize time expressions (absolute like “March 15” or relative like “next Wednesday”)
🍄

Extract location information (conference rooms, addresses, online links)
🍄

Associate participants with meeting topics
🍄

Output standardized calendar JSON importable to scheduling software

Operational example:
Group chat message:

“

“Let’s tentatively schedule the Q2 budget discussion for Friday at 3 PM in Conference Room 201, Building 3. Xiao Li and Xiao Wang will attend. If the time changes, mention it in the group beforehand.”

Extracted result:

{
  "events": [
    {
      "title": "Q2 Budget Discussion",
      "datetime": "This Friday 15:00",
      "location": "Conference Room 201, Building 3",
      "participants": ["Xiao Li", "Xiao Wang"],
      "source_chat": "Project Group",
      "flexible": true
    }
  ]
}

Group Chat Content Digest

Application scenario: Technical groups, industry channels, or study communities where members share valuable links, insights, and experiences. This high-density information gets buried by new messages quickly.

Extraction logic:

🍄

Identify long-form text, links, code blocks, and file shares
🍄

Assess content quality using heuristic signals (engagement metrics, sender history)
🍄

Categorize by topic (technical articles, industry perspectives, tool recommendations)
🍄

Generate content summaries while preserving original links

Operational example:
Technical group discussion:

“

[Alice shares a link] “This article clearly explains Rust’s ownership mechanism, especially the section on lifetimes—clearer than any documentation I’ve read. Attaching a related GitHub project with example code.”

Extracted result:

{
  "digest": [
    {
      "type": "Technical Article",
      "title": "Deep Dive into Rust Ownership",
      "summary": "Comprehensive explanation of Rust ownership with focus on lifetime concepts",
      "url": "https://example.com/rust-ownership",
      "shared_by": "Alice",
      "context": "Recommendation: clearer than official docs",
      "attachments": ["Related GitHub project link"],
      "timestamp": "2024-03-14 10:23"
    }
  ]
}

Features overview: Three types of information extraction and push

Technical Implementation: From Decryption to Sync

Core question: How does the system securely access WeChat data?

WeChat 4.0+ encrypts local databases using SQLCipher 4. Simply copying database files yields unreadable data—the system must first obtain decryption keys.

Key Extraction: Reading from Process Memory

WeChat loads database keys into memory during operation. The system uses a C-based memory scanning tool, find_all_keys_macos.c, to extract these keys from WeChat’s process space.

Operational steps:

# Compile the key extraction tool
gcc scripts/decrypt/find_all_keys_macos.c -o find_all_keys_macos

# Run extraction (requires sudo)
sudo ./find_all_keys_macos

Security note: Sudo is only used to access process memory. Extracted keys are stored in the local configuration file config.yaml and are never uploaded.

Author’s reflection: The “fragility” of key extraction and mitigation

A critical limitation of this approach is that WeChat keys change upon restart. Each time WeChat updates or restarts, you must re-extract keys. While refresh_decrypt.py performs HMAC verification before decryption and returns exit code 2 on failure to alert for re-extraction, this does add maintenance overhead.

This represents a necessary compromise between privacy protection and automation convenience. A “completely seamless” experience would require persistent key storage or cloud hosting—introducing greater security risks. The current approach at least makes users explicitly aware when keys are accessed, following the principle of least privilege.

Full Decryption and Incremental Sync

First-time use requires full database decryption:

python3 scripts/decrypt/decrypt_db.py

Subsequent runs employ incremental synchronization—the key performance optimization. WeChat uses SQLite’s WAL (Write-Ahead Logging) mode, where new messages write to .db-wal files rather than modifying the main database directly.

Incremental decryption principle:

Monitor .db-wal file changes
Decrypt only new WAL frames (typically 4MB WAL patches process in ~70ms)
Merge decrypted data into the local collector.db

This design allows frequent scheduled execution (e.g., every 15 minutes) without noticeable system impact.

Message Synchronization and Structured Extraction

collector.py syncs decrypted messages to collector.db with preliminary cleaning. Three independent extraction scripts then process different information types:

Script	Function	Output
`extract_todos.py`	Private chat todo extraction	`todos.json`
`extract_calendar.py`	Schedule extraction from conversations	`calendar.json`
`extract_digest.py`	Group chat valuable content collection	`digest.json`

These scripts handle data extraction only, not AI analysis. They output standardized JSON for subsequent prompt template processing.

AI Analysis: Prompt-Driven Intelligent Organization

Core question: How do you customize AI analysis to your specific needs?

The AI analysis phase is entirely driven by prompt templates—no code modification required to adjust AI behavior. This “configuration over coding” design enables non-technical users to customize analysis logic.

Prompt Template Structure

The prompts/ directory contains three template files:

🍄

todo-scan.md: Todo scanning instructions
🍄

calendar-scan.md: Schedule scanning instructions
🍄

digest.md: Content digest instructions

Templates use placeholders for dynamic configuration injection:

Placeholder	Meaning	Example Value
`{{config_path}}`	Absolute path to config.yaml	`/Users/laolin/.openclaw/skills/wx-echo/config.yaml`
`{{skill_dir}}`	Skill root directory	`/Users/laolin/.openclaw/skills/wx-echo`
`{{thread_id}}`	Discord thread ID	`1234567890`
`{{groups}}`	Monitored group IDs	`group1,group2,group3`
`{{ssh_host}}`	WeChat machine SSH address	`localhost` or empty

Customizing Analysis Logic: Example

Suppose you want todo extraction to prioritize deadlines over priority levels. Modify todo-scan.md:

# Todo Scanning Instructions

Analyze the following WeChat chat records and extract all todo items.

## Extraction Rules
1. Focus on explicit time markers (today, tomorrow, next [day], specific dates)
2. Mark as "unspecified" if no explicit time
3. Ignore declarative statements without actionable subjects

## Output Format
Each todo includes:
- task: Task description (verb-first, max 20 characters)
- deadline: Due date (YYYY-MM-DD format, empty if unspecified)
- context: Context summary (max 50 characters)

## Input Data
{{json_data}}

Author’s reflection: The art and science of prompt engineering

When debugging these templates, I deeply appreciated that prompt engineering is both art and science. The same analysis task produces vastly different results depending on phrasing. Early versions used vague instructions like “extract tasks,” causing the AI to misidentify “discussing a topic” as a task. Changing to “extract specific actions the user needs to perform” significantly improved accuracy.

I recommend an iterative optimization strategy: test templates with small samples first, observe misclassification patterns, then gradually refine rules. This proves more efficient than attempting to write a “perfect template” in one shot.

Installation and Configuration: Complete Setup Guide

Core question: How do you build this automation system from scratch?

Environment Preparation

System requirements:

🍄

macOS 13+ (validated on macOS 14/15)
🍄

WeChat Desktop 4.0+ (running)
🍄

Python 3.8+
🍄

OpenClaw AI assistant

Dependency installation:

pip3 install pycryptodome zstandard pyyaml

Quick Installation

The simplest approach is letting your OpenClaw Agent handle installation automatically. Send this instruction to your agent:

“

Help me install the wx-echo skill: first git clone https://github.com/laolin5564/openclaw-wx-echo to ~/.openclaw/skills/wx-echo, then guide me through setup following SKILL.md

The agent automatically:

Clones the code repository
Compiles the key extraction tool
Guides Discord webhook configuration
Registers scheduled tasks

Manual Configuration Process

For manual setup, follow these steps:

Step 1: Clone the code

git clone https://github.com/laolin5564/openclaw-wx-echo.git ~/.openclaw/skills/wx-echo
cd ~/.openclaw/skills/wx-echo

Step 2: Extract keys

# Compile
gcc scripts/decrypt/find_all_keys_macos.c -o find_all_keys_macos

# Extract (WeChat must be running)
sudo ./find_all_keys_macos

Step 3: Configure config.yaml
Copy and edit the template:

cp config.example.yaml config.yaml

Key configuration items:

# Discord push settings
discord:
  webhook_url: "https://discord.com/api/webhooks/..."
  thread_id: "1234567890"  # Thread ID, optional

# Monitored group list
groups:
  - "Project Communication Group"
  - "Tech Sharing Group"

# LLM API configuration
llm:
  provider: "claude"  # or openai
  api_key: "sk-..."
  model: "claude-3-sonnet-20240229"

Step 4: Initial full decryption

python3 scripts/decrypt/decrypt_db.py

Step 5: Test run

# Sync messages
python3 scripts/collector.py

# Extract todos
python3 scripts/extract_todos.py

# Push to Discord
python3 scripts/push_to_discord.py

Step 6: Register scheduled tasks
Use crontab or launchd to run every 15 minutes:

*/15 * * * * cd ~/.openclaw/skills/wx-echo && python3 scripts/refresh_decrypt.py && python3 scripts/collector.py && python3 scripts/extract_todos.py

Known Limitations and Mitigation Strategies

Core question: What pitfalls should you watch for when using this solution?

Level	Issue	Impact	Mitigation
🔴	WeChat restart invalidates keys	Decryption fails, scheduled tasks error	Re-run `sudo find_all_keys_macos`, consider monitoring alerts
🔴	macOS only	Windows/Linux users cannot use	Await community Windows adaptation or use Mac virtualization
🟡	Slow initial sync	Minutes with extensive history	Use `--chatroom` parameter for batch processing, or skip history
🟡	DB structure variations	Different WeChat versions may have different columns	Already compatible with multiple column names; modify `config.py` field mappings if issues arise

Author’s reflection: Trade-offs in technology selection

Choosing macOS exclusivity was a difficult decision. WeChat’s database encryption differs on Windows, and memory structures vary significantly—making cross-platform maintenance costly. For a personal project, ensuring stability on a single platform beats勉强 supporting multiple platforms with degraded experience.

If you need Windows support, monitor the project’s Issue list where developers discuss Windows adaptation. The core approach remains similar—extract keys from Windows WeChat process memory—but implementation must handle Windows permission models and memory layout differences.

Practical Summary and One-Page Overview

Action Checklist

Initial setup:

🍄

[ ] Confirm macOS 13+ and WeChat 4.0+ running
🍄

[ ] Install Python dependencies: pip3 install pycryptodome zstandard pyyaml
🍄

[ ] Clone code to ~/.openclaw/skills/wx-echo
🍄

[ ] Compile and run key extraction: sudo ./find_all_keys_macos
🍄

[ ] Copy and edit config.yaml
🍄

[ ] Execute initial full decryption
🍄

[ ] Test individual extraction scripts
🍄

[ ] Register scheduled tasks

Ongoing maintenance:

🍄

[ ] Re-extract keys after WeChat restart
🍄

[ ] Monthly check of Discord webhook validity
🍄

[ ] Regular backup of config.yaml and extracted JSON files

One-Page Overview

Component	Function	Key File
Key Extraction	Obtain DB keys from WeChat memory	`find_all_keys_macos.c`
Full Decryption	Decrypt entire database initially	`decrypt_db.py`
Incremental Decryption	Scheduled sync of new messages	`refresh_decrypt.py`
Message Collection	Clean and import to database	`collector.py`
Todo Extraction	Private chat task identification	`extract_todos.py`
Schedule Extraction	Meeting arrangement identification	`extract_calendar.py`
Digest Extraction	Group chat valuable content identification	`extract_digest.py`
Prompt Templates	Drive AI analysis logic	`prompts/*.md`
Configuration	Unified parameter management	`config.yaml`

Frequently Asked Questions

Q1: Is this solution secure? Will my chat records leak?
A: Original encrypted databases are processed locally. Only structured summaries (not raw conversations) are sent to your configured AI API. Key extraction requires sudo but operates only in local memory and never uploads.

Q2: Why must I re-extract keys after WeChat restart?
A: WeChat 4.0+ generates new database keys on each launch; old keys become invalid. This is WeChat’s security mechanism that the solution cannot bypass, only adapt to through re-extraction.

Q3: Does it support Windows or Linux?
A: Currently macOS only. Windows requires different memory scanning and path handling logic. Community contributions for Windows adaptation are welcome.

Q4: Can I skip the slow initial historical sync?
A: Yes. Modify collector.py parameters to sync only recent days, or specify particular groups for batch processing.

Q5: What if Discord push fails?
A: Check that the Webhook URL in config.yaml is valid (Discord channel settings → Integrations → Webhook) and confirm network access to Discord.

Q6: Can I customize push to other platforms (Lark, DingTalk, Slack)?
A: Yes. Modify scripts/push_to_discord.py or create new push scripts, maintaining consistent input JSON format.

Q7: How do I optimize inaccurate AI analysis results?
A: Modify the corresponding template in the prompts/ directory to refine extraction rules. Test with small samples first, observe misclassification patterns, then adjust.

Q8: Scheduled tasks aren’t running after setup—how to troubleshoot?
A: Check refresh_decrypt.py exit code; 2 indicates key expiration requiring re-extraction. Review system logs or run scripts manually to observe error messages.

Conclusion: Tools as Tools, Humans as Creators

The core value of automating WeChat information management is not turning us into “message processing machines,” but liberating cognitive resources from tedious information triage. When todos, schedules, and valuable content automatically aggregate to a unified inbox, we can focus on what truly matters: deep thinking, creative problem-solving, and genuine human connection.

This solution’s design philosophy—local-first, privacy-controlled, prompt-driven—reflects an important trend in current AI application development: treating AI as a flexible “inference engine” rather than a black-box “universal assistant.” Through clear boundary delineation, we enjoy AI’s semantic understanding capabilities while maintaining complete data control.

Technology’s ultimate goal is serving people. May this solution help you better manage digital life, letting WeChat return to its essence of “connection” rather than becoming a source of “burden.”

Privacy-First WeChat Assistant: Automate Todos & Schedules While Keeping Chats Local

★Building a Privacy-First AI Assistant for WeChat Information Management★

Why Automate WeChat Information Management?

System Architecture: Local Extraction Plus AI Analysis

Architecture Overview

Privacy Protection Mechanisms

Core Capabilities: Three Types of Automated Extraction

Private Chat Todo Extraction

Schedule Extraction from Conversations

Group Chat Content Digest

Technical Implementation: From Decryption to Sync

Key Extraction: Reading from Process Memory

Full Decryption and Incremental Sync

Message Synchronization and Structured Extraction

AI Analysis: Prompt-Driven Intelligent Organization

Prompt Template Structure

Customizing Analysis Logic: Example

Installation and Configuration: Complete Setup Guide

Environment Preparation

Quick Installation

Manual Configuration Process

Known Limitations and Mitigation Strategies

Practical Summary and One-Page Overview

Action Checklist

One-Page Overview

Frequently Asked Questions

Conclusion: Tools as Tools, Humans as Creators

Related Posts