★Building a Privacy-First AI Assistant for WeChat Information Management★
Core question: How can you automatically extract todos, schedules, and valuable content from WeChat conversations while keeping sensitive data local?
WeChat has become the central nervous system for work, learning, and social coordination in many regions. Every day, countless action items, meeting arrangements, and valuable insights flow through chat windows—yet most of this information evaporates into the stream of new messages. This article presents a macOS-based automation system that uses an AI assistant to extract structured information from WeChat chats and push it to Discord for centralized management. The entire data extraction pipeline runs locally, ensuring that your private conversations never leave your machine unencrypted.
Image source: Unsplash
Why Automate WeChat Information Management?
Core question: What pain points does manual WeChat organization create?
Consider a typical workday: you’re coordinating across three project groups, a friend shared five valuable articles in private messages, and colleagues scheduled three meetings. By day’s end, you face hundreds of messages containing scattered tasks, fragmented meeting details, and reference materials buried in chat history.
The cognitive load is substantial. More critically, WeChat’s search functionality—while powerful—lacks semantic understanding. It matches keywords but cannot recognize that “let’s meet tomorrow afternoon at three” is a scheduling event, or distinguish casual discussion from actionable tasks requiring follow-up.
The value of automation lies in letting machines handle information triage and structuring, freeing human attention for decision-making and execution.
System Architecture: Local Extraction Plus AI Analysis
Core question: How does this solution balance efficiency with privacy protection?
The system employs a two-layer architecture: local data extraction followed by cloud-based AI analysis. This design preserves processing speed while ensuring sensitive information never reaches external services in raw form.
Architecture Overview
Phase One: Local Data Extraction (CLI Tools)
- 🍄
Extract database decryption keys from WeChat process memory - 🍄
Decrypt SQLCipher 4-encrypted local databases - 🍄
Incrementally sync messages to a local SQLite database - 🍄
Output structured JSON files (todos, calendar events, content digests)
Phase Two: AI-Powered Analysis (Prompt-Driven)
- 🍄
Read extracted JSON data - 🍄
Process through large language model APIs (Claude, GPT, etc.) - 🍄
Classify and organize according to customizable templates - 🍄
Push results to designated Discord channels

Architecture overview: CLI handles data extraction while prompt templates drive AI analysis
Privacy Protection Mechanisms
Author’s reflection: Why insist on local extraction?
When designing this system, I repeatedly weighed cloud processing against local processing. WeChat databases contain sensitive business discussions, personal matters, and unconfirmed decisions. Uploading raw data to cloud analysis services—even with privacy promises—introduces data breach risks and compliance concerns.
The solution establishes clear data boundaries:
- 🍄
Original encrypted databases: Never leave the local machine - 🍄
Decrypted chat summaries: Only structured text extracts are sent to AI APIs - 🍄
Key extraction: Requires sudo privileges but operates only in local memory
This design makes privacy risks manageable. You know precisely what data leaves your device, and that content is already summarized rather than raw conversation.
Core Capabilities: Three Types of Automated Extraction
Core question: What information types can the system automatically identify?
The system targets three common information management scenarios in daily work and learning.
Private Chat Todo Extraction
Application scenario: One-on-one conversations with colleagues, friends, or family frequently contain requests like “please handle this” or “don’t forget to do that.” These tasks scatter across multiple chats and are easily overlooked.
Extraction logic:
- 🍄
Scan private messages for action-oriented vocabulary (“need to,” “remember to,” “don’t forget,” etc.) - 🍄
Identify task subjects, deadlines, and priority indicators - 🍄
Output contextualized todos with source references
Operational example:
You receive this message:
“
“Remember to send the report to Director Zhang before tomorrow afternoon, and also confirm whether Wednesday’s meeting time changed to 2 PM.”
The system extracts:
{
"todos": [
{
"task": "Send report to Director Zhang",
"deadline": "Tomorrow afternoon",
"context": "From conversation with [Colleague A]",
"priority": "high"
},
{
"task": "Confirm if Wednesday meeting moved to 2 PM",
"deadline": "Tomorrow afternoon",
"context": "From conversation with [Colleague A]",
"priority": "medium"
}
]
}
Schedule Extraction from Conversations
Application scenario: Group or private chats where people coordinate meeting times, arrange meetups, or plan events. These messages typically contain time, location, participants, and topic—though expressed flexibly.
Extraction logic:
- 🍄
Recognize time expressions (absolute like “March 15” or relative like “next Wednesday”) - 🍄
Extract location information (conference rooms, addresses, online links) - 🍄
Associate participants with meeting topics - 🍄
Output standardized calendar JSON importable to scheduling software
Operational example:
Group chat message:
“
“Let’s tentatively schedule the Q2 budget discussion for Friday at 3 PM in Conference Room 201, Building 3. Xiao Li and Xiao Wang will attend. If the time changes, mention it in the group beforehand.”
Extracted result:
{
"events": [
{
"title": "Q2 Budget Discussion",
"datetime": "This Friday 15:00",
"location": "Conference Room 201, Building 3",
"participants": ["Xiao Li", "Xiao Wang"],
"source_chat": "Project Group",
"flexible": true
}
]
}
Group Chat Content Digest
Application scenario: Technical groups, industry channels, or study communities where members share valuable links, insights, and experiences. This high-density information gets buried by new messages quickly.
Extraction logic:
- 🍄
Identify long-form text, links, code blocks, and file shares - 🍄
Assess content quality using heuristic signals (engagement metrics, sender history) - 🍄
Categorize by topic (technical articles, industry perspectives, tool recommendations) - 🍄
Generate content summaries while preserving original links
Operational example:
Technical group discussion:
“
[Alice shares a link] “This article clearly explains Rust’s ownership mechanism, especially the section on lifetimes—clearer than any documentation I’ve read. Attaching a related GitHub project with example code.”
Extracted result:
{
"digest": [
{
"type": "Technical Article",
"title": "Deep Dive into Rust Ownership",
"summary": "Comprehensive explanation of Rust ownership with focus on lifetime concepts",
"url": "https://example.com/rust-ownership",
"shared_by": "Alice",
"context": "Recommendation: clearer than official docs",
"attachments": ["Related GitHub project link"],
"timestamp": "2024-03-14 10:23"
}
]
}

Features overview: Three types of information extraction and push
Technical Implementation: From Decryption to Sync
Core question: How does the system securely access WeChat data?
WeChat 4.0+ encrypts local databases using SQLCipher 4. Simply copying database files yields unreadable data—the system must first obtain decryption keys.
Key Extraction: Reading from Process Memory
WeChat loads database keys into memory during operation. The system uses a C-based memory scanning tool, find_all_keys_macos.c, to extract these keys from WeChat’s process space.
Operational steps:
# Compile the key extraction tool
gcc scripts/decrypt/find_all_keys_macos.c -o find_all_keys_macos
# Run extraction (requires sudo)
sudo ./find_all_keys_macos
Security note: Sudo is only used to access process memory. Extracted keys are stored in the local configuration file config.yaml and are never uploaded.
Author’s reflection: The “fragility” of key extraction and mitigation
A critical limitation of this approach is that WeChat keys change upon restart. Each time WeChat updates or restarts, you must re-extract keys. While refresh_decrypt.py performs HMAC verification before decryption and returns exit code 2 on failure to alert for re-extraction, this does add maintenance overhead.
This represents a necessary compromise between privacy protection and automation convenience. A “completely seamless” experience would require persistent key storage or cloud hosting—introducing greater security risks. The current approach at least makes users explicitly aware when keys are accessed, following the principle of least privilege.
Full Decryption and Incremental Sync
First-time use requires full database decryption:
python3 scripts/decrypt/decrypt_db.py
Subsequent runs employ incremental synchronization—the key performance optimization. WeChat uses SQLite’s WAL (Write-Ahead Logging) mode, where new messages write to .db-wal files rather than modifying the main database directly.
Incremental decryption principle:
-
Monitor .db-walfile changes -
Decrypt only new WAL frames (typically 4MB WAL patches process in ~70ms) -
Merge decrypted data into the local collector.db
This design allows frequent scheduled execution (e.g., every 15 minutes) without noticeable system impact.
Message Synchronization and Structured Extraction
collector.py syncs decrypted messages to collector.db with preliminary cleaning. Three independent extraction scripts then process different information types:
These scripts handle data extraction only, not AI analysis. They output standardized JSON for subsequent prompt template processing.
AI Analysis: Prompt-Driven Intelligent Organization
Core question: How do you customize AI analysis to your specific needs?
The AI analysis phase is entirely driven by prompt templates—no code modification required to adjust AI behavior. This “configuration over coding” design enables non-technical users to customize analysis logic.
Prompt Template Structure
The prompts/ directory contains three template files:
- 🍄
todo-scan.md: Todo scanning instructions - 🍄
calendar-scan.md: Schedule scanning instructions - 🍄
digest.md: Content digest instructions
Templates use placeholders for dynamic configuration injection:
Customizing Analysis Logic: Example
Suppose you want todo extraction to prioritize deadlines over priority levels. Modify todo-scan.md:
# Todo Scanning Instructions
Analyze the following WeChat chat records and extract all todo items.
## Extraction Rules
1. Focus on explicit time markers (today, tomorrow, next [day], specific dates)
2. Mark as "unspecified" if no explicit time
3. Ignore declarative statements without actionable subjects
## Output Format
Each todo includes:
- task: Task description (verb-first, max 20 characters)
- deadline: Due date (YYYY-MM-DD format, empty if unspecified)
- context: Context summary (max 50 characters)
## Input Data
{{json_data}}
Author’s reflection: The art and science of prompt engineering
When debugging these templates, I deeply appreciated that prompt engineering is both art and science. The same analysis task produces vastly different results depending on phrasing. Early versions used vague instructions like “extract tasks,” causing the AI to misidentify “discussing a topic” as a task. Changing to “extract specific actions the user needs to perform” significantly improved accuracy.
I recommend an iterative optimization strategy: test templates with small samples first, observe misclassification patterns, then gradually refine rules. This proves more efficient than attempting to write a “perfect template” in one shot.
Installation and Configuration: Complete Setup Guide
Core question: How do you build this automation system from scratch?
Environment Preparation
System requirements:
- 🍄
macOS 13+ (validated on macOS 14/15) - 🍄
WeChat Desktop 4.0+ (running) - 🍄
Python 3.8+ - 🍄
OpenClaw AI assistant
Dependency installation:
pip3 install pycryptodome zstandard pyyaml
Quick Installation
The simplest approach is letting your OpenClaw Agent handle installation automatically. Send this instruction to your agent:
“
Help me install the wx-echo skill: first git clone https://github.com/laolin5564/openclaw-wx-echo to ~/.openclaw/skills/wx-echo, then guide me through setup following SKILL.md
The agent automatically:
-
Clones the code repository -
Compiles the key extraction tool -
Guides Discord webhook configuration -
Registers scheduled tasks
Manual Configuration Process
For manual setup, follow these steps:
Step 1: Clone the code
git clone https://github.com/laolin5564/openclaw-wx-echo.git ~/.openclaw/skills/wx-echo
cd ~/.openclaw/skills/wx-echo
Step 2: Extract keys
# Compile
gcc scripts/decrypt/find_all_keys_macos.c -o find_all_keys_macos
# Extract (WeChat must be running)
sudo ./find_all_keys_macos
Step 3: Configure config.yaml
Copy and edit the template:
cp config.example.yaml config.yaml
Key configuration items:
# Discord push settings
discord:
webhook_url: "https://discord.com/api/webhooks/..."
thread_id: "1234567890" # Thread ID, optional
# Monitored group list
groups:
- "Project Communication Group"
- "Tech Sharing Group"
# LLM API configuration
llm:
provider: "claude" # or openai
api_key: "sk-..."
model: "claude-3-sonnet-20240229"
Step 4: Initial full decryption
python3 scripts/decrypt/decrypt_db.py
Step 5: Test run
# Sync messages
python3 scripts/collector.py
# Extract todos
python3 scripts/extract_todos.py
# Push to Discord
python3 scripts/push_to_discord.py
Step 6: Register scheduled tasks
Use crontab or launchd to run every 15 minutes:
*/15 * * * * cd ~/.openclaw/skills/wx-echo && python3 scripts/refresh_decrypt.py && python3 scripts/collector.py && python3 scripts/extract_todos.py
Known Limitations and Mitigation Strategies
Core question: What pitfalls should you watch for when using this solution?
Author’s reflection: Trade-offs in technology selection
Choosing macOS exclusivity was a difficult decision. WeChat’s database encryption differs on Windows, and memory structures vary significantly—making cross-platform maintenance costly. For a personal project, ensuring stability on a single platform beats勉强 supporting multiple platforms with degraded experience.
If you need Windows support, monitor the project’s Issue list where developers discuss Windows adaptation. The core approach remains similar—extract keys from Windows WeChat process memory—but implementation must handle Windows permission models and memory layout differences.
Practical Summary and One-Page Overview
Action Checklist
Initial setup:
- 🍄
[ ] Confirm macOS 13+ and WeChat 4.0+ running - 🍄
[ ] Install Python dependencies: pip3 install pycryptodome zstandard pyyaml - 🍄
[ ] Clone code to ~/.openclaw/skills/wx-echo - 🍄
[ ] Compile and run key extraction: sudo ./find_all_keys_macos - 🍄
[ ] Copy and edit config.yaml - 🍄
[ ] Execute initial full decryption - 🍄
[ ] Test individual extraction scripts - 🍄
[ ] Register scheduled tasks
Ongoing maintenance:
- 🍄
[ ] Re-extract keys after WeChat restart - 🍄
[ ] Monthly check of Discord webhook validity - 🍄
[ ] Regular backup of config.yamland extracted JSON files
One-Page Overview
Frequently Asked Questions
Q1: Is this solution secure? Will my chat records leak?
A: Original encrypted databases are processed locally. Only structured summaries (not raw conversations) are sent to your configured AI API. Key extraction requires sudo but operates only in local memory and never uploads.
Q2: Why must I re-extract keys after WeChat restart?
A: WeChat 4.0+ generates new database keys on each launch; old keys become invalid. This is WeChat’s security mechanism that the solution cannot bypass, only adapt to through re-extraction.
Q3: Does it support Windows or Linux?
A: Currently macOS only. Windows requires different memory scanning and path handling logic. Community contributions for Windows adaptation are welcome.
Q4: Can I skip the slow initial historical sync?
A: Yes. Modify collector.py parameters to sync only recent days, or specify particular groups for batch processing.
Q5: What if Discord push fails?
A: Check that the Webhook URL in config.yaml is valid (Discord channel settings → Integrations → Webhook) and confirm network access to Discord.
Q6: Can I customize push to other platforms (Lark, DingTalk, Slack)?
A: Yes. Modify scripts/push_to_discord.py or create new push scripts, maintaining consistent input JSON format.
Q7: How do I optimize inaccurate AI analysis results?
A: Modify the corresponding template in the prompts/ directory to refine extraction rules. Test with small samples first, observe misclassification patterns, then adjust.
Q8: Scheduled tasks aren’t running after setup—how to troubleshoot?
A: Check refresh_decrypt.py exit code; 2 indicates key expiration requiring re-extraction. Review system logs or run scripts manually to observe error messages.
Conclusion: Tools as Tools, Humans as Creators
The core value of automating WeChat information management is not turning us into “message processing machines,” but liberating cognitive resources from tedious information triage. When todos, schedules, and valuable content automatically aggregate to a unified inbox, we can focus on what truly matters: deep thinking, creative problem-solving, and genuine human connection.
This solution’s design philosophy—local-first, privacy-controlled, prompt-driven—reflects an important trend in current AI application development: treating AI as a flexible “inference engine” rather than a black-box “universal assistant.” Through clear boundary delineation, we enjoy AI’s semantic understanding capabilities while maintaining complete data control.
Technology’s ultimate goal is serving people. May this solution help you better manage digital life, letting WeChat return to its essence of “connection” rather than becoming a source of “burden.”

