mirror of https://github.com/jackwener/wx-cli.git
455 lines
18 KiB
Markdown
455 lines
18 KiB
Markdown
# wx-cli Architecture Analysis
|
|
|
|
## Overview
|
|
|
|
**wx-cli** is a cross-platform Rust CLI tool for extracting and querying local WeChat 4.x data. It decrypts SQLCipher-encrypted databases, caches decrypted copies with mtime-aware invalidation, and provides a daemon-based IPC architecture for fast repeated queries.
|
|
|
|
**Key characteristics:**
|
|
- Single binary, zero runtime dependencies
|
|
- Cross-platform: macOS, Linux, Windows
|
|
- Millisecond response times via daemon caching
|
|
- AI-friendly output (YAML by default, JSON optional)
|
|
- All data processed locally, no network calls
|
|
|
|
---
|
|
|
|
## High-Level Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ wx (CLI client) │
|
|
│ src/cli/mod.rs - clap-based command parsing │
|
|
│ Commands: init, sessions, history, search, contacts, export, │
|
|
│ unread, members, new-messages, stats, favorites, sns-* │
|
|
└────────────────────────────┬────────────────────────────────────────┘
|
|
│ IPC (Unix socket / Windows named pipe)
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ wx-daemon (background process) │
|
|
│ src/daemon/mod.rs - tokio async runtime │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
|
|
│ │ DbCache │ │ Names │ │ IPC Server │ │
|
|
│ │ (mtime-aware)│ │ (contact map)│ │ (JSON line protocol) │ │
|
|
│ │ src/daemon/ │ │ src/daemon/ │ │ src/daemon/ │ │
|
|
│ │ cache.rs │ │ query.rs │ │ server.rs │ │
|
|
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
|
|
│ │
|
|
│ On startup: │
|
|
│ 1. Load config + keys from ~/.wx-cli/ │
|
|
│ 2. Pre-warm: decrypt session.db, sns.db, load contacts │
|
|
│ 3. Listen on socket/pipe for requests │
|
|
└────────────────────────────┬────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Crypto Layer │
|
|
│ src/crypto/mod.rs + wal.rs │
|
|
│ │
|
|
│ - SQLCipher 4 page decryption (AES-256-CBC) │
|
|
│ - WAL (Write-Ahead Log) application │
|
|
│ - Streaming decryption (page-by-page, avoids full-file load) │
|
|
└────────────────────────────┬────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Scanner Layer │
|
|
│ src/scanner/{macos,linux,windows}.rs │
|
|
│ │
|
|
│ Platform-specific memory scanners: │
|
|
│ - macOS: Mach VM API (task_for_pid, mach_vm_region, mach_vm_read) │
|
|
│ - Linux: /proc/<pid>/mem + /proc/<pid>/maps │
|
|
│ - Windows: CreateToolhelp32Snapshot + ReadProcessMemory │
|
|
│ │
|
|
│ Pattern: x'<64hex_key><32hex_salt>' in WeChat process memory │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Module Breakdown
|
|
|
|
### 1. Entry Point (`src/main.rs`)
|
|
|
|
```rust
|
|
fn main() {
|
|
if std::env::var("WX_DAEMON_MODE").is_ok() {
|
|
daemon::run(); // Background daemon mode
|
|
} else {
|
|
cli::run(); // CLI client mode
|
|
}
|
|
}
|
|
```
|
|
|
|
Single binary acts as both client and daemon. Daemon spawned via `WX_DAEMON_MODE=1` env var.
|
|
|
|
---
|
|
|
|
### 2. CLI Layer (`src/cli/`)
|
|
|
|
**`mod.rs`** - Command definitions via clap derive macros:
|
|
- 17 subcommands (Init, Sessions, History, Search, Contacts, Export, Unread, Members, NewMessages, Stats, Favorites, SnsNotifications, SnsFeed, SnsSearch, Daemon)
|
|
- Each command dispatches to dedicated module (e.g., `history::cmd_history`)
|
|
- All commands share `--json` flag for output format toggle
|
|
|
|
**`transport.rs`** - IPC client:
|
|
- `ensure_daemon()` - auto-start daemon if not running
|
|
- `send()` - JSON line protocol over Unix socket / Windows named pipe
|
|
- Timeout handling (15s startup, 120s request)
|
|
- Permission preflight check for ~/.wx-cli/ directory
|
|
|
|
**Command modules** (`sessions.rs`, `history.rs`, etc.):
|
|
- Parse CLI args → build IPC `Request`
|
|
- Send to daemon → receive `Response`
|
|
- Format output (YAML/JSON) via `output.rs`
|
|
|
|
---
|
|
|
|
### 3. Daemon Layer (`src/daemon/`)
|
|
|
|
**`mod.rs`** - Daemon lifecycle:
|
|
```rust
|
|
async fn async_run() -> Result<()> {
|
|
// 1. Create ~/.wx-cli/ + cache/ directories
|
|
// 2. Write PID file
|
|
// 3. Setup signal handlers (SIGTERM/SIGINT)
|
|
// 4. Load config + keys
|
|
// 5. Initialize DbCache (mtime-aware decryption cache)
|
|
// 6. Pre-warm: load contacts, decrypt session.db + sns.db
|
|
// 7. Start IPC server (blocking loop)
|
|
}
|
|
```
|
|
|
|
**`cache.rs`** - DbCache (critical performance component):
|
|
- `HashMap<String, CacheEntry>` in-memory cache
|
|
- `CacheEntry`: `{ db_mtime, wal_mtime, decrypted_path }`
|
|
- **mtime-aware invalidation**: re-decrypt only when `.db` or `.db-wal` mtime changes
|
|
- Persistent mtime records in `~/.wx-cli/cache/_mtimes.json`
|
|
- Cache reuse on daemon restart (avoids re-decryption)
|
|
- Uses MD5 hash of rel_key for cache filename
|
|
|
|
**`server.rs`** - IPC server:
|
|
- Unix: `tokio::net::UnixListener` on `~/.wx-cli/daemon.sock`
|
|
- Windows: `interprocess` named pipe `\\.\pipe\wx-cli-daemon`
|
|
- One connection per request, JSON line protocol
|
|
- `dispatch()` routes `Request` → query functions
|
|
|
|
**`query.rs`** - Query implementations (~1500 lines):
|
|
- `Names` struct: contact name cache + MD5→username lookup + verify_flags
|
|
- `chat_type_of()`: classify as `private`/`group`/`official_account`/`folded`
|
|
- Query functions: `q_sessions`, `q_history`, `q_search`, `q_contacts`, `q_unread`, `q_members`, `q_new_messages`, `q_stats`, `q_favorites`, `q_sns_*`
|
|
- Message parsing: zstd decompression, XML extraction (appmsg, sysmsg, revokemsg)
|
|
- Uses `spawn_blocking` for SQLite queries (rusqlite is sync)
|
|
|
|
---
|
|
|
|
### 4. Crypto Layer (`src/crypto/`)
|
|
|
|
**`mod.rs`** - SQLCipher 4 decryption:
|
|
```rust
|
|
// Constants
|
|
PAGE_SZ = 4096
|
|
SALT_SZ = 16
|
|
RESERVE_SZ = 80 // IV(16) + HMAC(64)
|
|
|
|
// Key operations
|
|
fn decrypt_page(enc_key: &[u8; 32], page_data: &[u8], pgno: u32) -> Vec<u8>
|
|
fn full_decrypt(db_path: &Path, out_path: &Path, enc_key: &[u8; 32])
|
|
```
|
|
|
|
**Algorithm:**
|
|
- AES-256-CBC decryption
|
|
- IV located at page end: `PAGE_SZ - RESERVE_SZ` offset
|
|
- Page 1 special handling: skip 16-byte SALT, write SQLite magic header
|
|
- Other pages: decrypt `[0..PAGE_SZ-RESERVE_SZ]`
|
|
- Streaming (page-by-page) to avoid full-file memory load
|
|
|
|
**`wal.rs`** - WAL application:
|
|
- WAL header: 32 bytes (magic, format, page_sz, ckpt_seq, salt1/2, cksum1/2)
|
|
- Frame: 24-byte header + PAGE_SZ data
|
|
- Frame matching via salt1/2 validation
|
|
- Random-write to decrypted DB at `(pgno-1) * PAGE_SZ`
|
|
|
|
---
|
|
|
|
### 5. Scanner Layer (`src/scanner/`)
|
|
|
|
**Common interface** (`mod.rs`):
|
|
```rust
|
|
pub struct KeyEntry {
|
|
db_name: String, // relative path
|
|
enc_key: String, // 64-char hex (32 bytes)
|
|
salt: String, // 32-char hex (16 bytes)
|
|
}
|
|
|
|
pub fn scan_keys(db_dir: &Path) -> Result<Vec<KeyEntry>> // platform-specific
|
|
pub fn read_db_salt(path: &Path) -> Option<String>
|
|
pub fn collect_db_salts(db_dir: &Path) -> Vec<(String, String)>
|
|
```
|
|
|
|
**Pattern searched**: `x'<96 hex chars>'` = 64-char key + 32-char salt
|
|
|
|
**macOS** (`macos.rs`):
|
|
- `task_for_pid` → get Mach task port (requires root + ad-hoc signed WeChat)
|
|
- `mach_vm_region` → enumerate VM regions
|
|
- `mach_vm_read` → read 2MB chunks
|
|
- Filter: `VM_PROT_READ | VM_PROT_WRITE` regions only
|
|
- Deduplication by (key, salt) pair
|
|
|
|
**Linux** (`linux.rs`):
|
|
- `/proc/<pid>/comm` → find `wechat`/`weixin` process
|
|
- `/proc/<pid>/maps` → parse `rw-` regions
|
|
- `/proc/<pid>/mem` → seek + read
|
|
- Same chunk/dedup strategy
|
|
|
|
**Windows** (`windows.rs`):
|
|
- `CreateToolhelp32Snapshot` → find `Weixin.exe`
|
|
- `OpenProcess(PROCESS_VM_READ | PROCESS_QUERY_INFORMATION)`
|
|
- `VirtualQueryEx` → enumerate `MEM_COMMIT + PAGE_READWRITE` regions
|
|
- `ReadProcessMemory` → chunk read
|
|
|
|
---
|
|
|
|
### 6. IPC Protocol (`src/ipc.rs`)
|
|
|
|
**Request** (tagged enum):
|
|
```rust
|
|
pub enum Request {
|
|
Ping,
|
|
Sessions { limit: usize },
|
|
History { chat, limit, offset, since, until, msg_type },
|
|
Search { keyword, chats, limit, since, until, msg_type },
|
|
Contacts { query, limit },
|
|
Unread { limit, filter },
|
|
Members { chat },
|
|
NewMessages { state, limit },
|
|
Stats { chat, since, until },
|
|
Favorites { limit, fav_type, query },
|
|
SnsNotifications { limit, since, until, include_read },
|
|
SnsFeed { limit, since, until, user },
|
|
SnsSearch { keyword, limit, since, until, user },
|
|
}
|
|
```
|
|
|
|
**Response**:
|
|
```rust
|
|
pub struct Response {
|
|
ok: bool,
|
|
error: Option<String>,
|
|
data: Value, // flattened JSON
|
|
}
|
|
```
|
|
|
|
Protocol: newline-delimited JSON, one request per connection.
|
|
|
|
---
|
|
|
|
### 7. Config Layer (`src/config.rs`)
|
|
|
|
**Config struct**:
|
|
```rust
|
|
pub struct Config {
|
|
db_dir: PathBuf, // WeChat db_storage path
|
|
keys_file: PathBuf, // all_keys.json
|
|
decrypted_dir: PathBuf, // (unused, cache dir used instead)
|
|
wechat_process: String, // process name for scanner
|
|
}
|
|
```
|
|
|
|
**Paths**:
|
|
- `cli_dir()`: `~/.wx-cli/`
|
|
- `sock_path()`: `~/.wx-cli/daemon.sock`
|
|
- `cache_dir()`: `~/.wx-cli/cache/`
|
|
- `mtime_file()`: `~/.wx-cli/cache/_mtimes.json`
|
|
|
|
**Auto-detection** (`auto_detect_db_dir()`):
|
|
- macOS: `~/Library/Containers/com.tencent.xinWeChat/.../xwechat_files/*/db_storage`
|
|
- Linux: `~/Documents/xwechat_files/*/db_storage` + legacy path
|
|
- Windows: `%APPDATA%/Tencent/xwechat/config/*.ini` → parse data root
|
|
|
|
---
|
|
|
|
## Data Flow
|
|
|
|
### Init Flow (`wx init`)
|
|
|
|
```
|
|
1. Auto-detect db_dir → scan for db_storage directory
|
|
2. collect_db_salts(db_dir) → (salt_hex, rel_path) list
|
|
3. scan_keys(db_dir) → memory scan → (key_hex, salt_hex) candidates
|
|
4. Match: salt_hex == db_salt → KeyEntry { db_name, enc_key, salt }
|
|
5. Write ~/.wx-cli/config.json + ~/.wx-cli/all_keys.json
|
|
```
|
|
|
|
### Query Flow (e.g., `wx history "张三"`)
|
|
|
|
```
|
|
1. CLI: parse args → Request::History { chat: "张三", limit: 50 }
|
|
2. transport::ensure_daemon() → start if not alive
|
|
3. transport::send(Request) → Unix socket/pipe → daemon
|
|
4. daemon::dispatch(Request) → q_history()
|
|
a. resolve_username("张三") → "wxid_xxx" (fuzzy match against Names)
|
|
b. find_msg_tables(db, names, username) → [(db_path, "Msg_<md5>")]
|
|
c. spawn_blocking: SQLite query on decrypted db_path
|
|
d. decompress_message (zstd) + fmt_content (XML parsing)
|
|
5. Response::ok(json!{ chat, messages, ... })
|
|
6. CLI: output.rs → YAML/JSON formatting
|
|
```
|
|
|
|
### Decryption Flow (DbCache::get)
|
|
|
|
```
|
|
1. Check in-memory cache: if entry.mtime matches → return cached path
|
|
2. mtime mismatch or missing → spawn_blocking decrypt:
|
|
a. crypto::full_decrypt(db_path, out_path, enc_key)
|
|
b. If .db-wal exists: wal::apply_wal(wal_path, out_path, enc_key)
|
|
3. Update cache entry + persist mtimes to _mtimes.json
|
|
4. Return decrypted path for SQLite query
|
|
```
|
|
|
|
---
|
|
|
|
## Database Schema Knowledge
|
|
|
|
**session/session.db**:
|
|
- `SessionTable`: username, unread_count, summary, last_timestamp, last_msg_type, last_msg_sender
|
|
|
|
**contact/contact.db**:
|
|
- `contact`: username, nick_name, remark, verify_flag
|
|
- `chat_room`: id, owner (for group info)
|
|
- `chatroom_member`: room_id, member_id (joined with contact)
|
|
|
|
**message/message_N.db**:
|
|
- `Msg_<md5(username)>`: local_id, local_type, create_time, real_sender_id, message_content, WCDB_CT_message_content
|
|
- `Name2Id`: rowid → user_name (sender lookup)
|
|
- WCDB_CT = 4 means zstd compression
|
|
|
|
**sns/sns.db**:
|
|
- `sns_notification`: type (like/comment), from_nickname, content, feed_preview
|
|
- `sns_feed_xml`: author, contentDesc, media XML, createTime
|
|
|
|
**favorite/favorite.db**:
|
|
- `fav_db_item`: local_id, type, update_time, content, fromusr
|
|
|
|
---
|
|
|
|
## Performance Optimizations
|
|
|
|
1. **mtime-aware caching**: Only re-decrypt when source file changes
|
|
2. **Pre-warming**: Decrypt session.db + sns.db + contacts on daemon start
|
|
3. **Arc-wrapped Names**: Contact cache shared via Arc, cloned in O(1)
|
|
4. **spawn_blocking**: Sync SQLite ops off async runtime
|
|
5. **Streaming decrypt**: Page-by-page, no full file in memory
|
|
6. **WAL handling**: Apply uncommitted writes without re-decrypt
|
|
7. **MD5 table lookup**: `Msg_<md5>` → username via precomputed hash map
|
|
|
|
---
|
|
|
|
## Security Considerations
|
|
|
|
1. **Root/Admin required**: Memory scan needs elevated privileges
|
|
2. **No secrets logged**: Keys written to file, never echoed
|
|
3. **Socket permissions**: Unix socket mode 0600
|
|
4. **Local-only**: All IPC is localhost, no network exposure
|
|
5. **User consent implied**: Only decrypts own WeChat data
|
|
|
|
---
|
|
|
|
## Error Handling Patterns
|
|
|
|
- `anyhow::Result` throughout
|
|
- Context messages for chain debugging
|
|
- Graceful degradation: missing tables → fallback paths
|
|
- Preflight checks (e.g., ~/.wx-cli writable before daemon spawn)
|
|
- Signal handlers for clean shutdown (socket/PID file cleanup)
|
|
|
|
---
|
|
|
|
## Cross-Platform Notes
|
|
|
|
| Platform | Scanner API | IPC | Privilege | DB Path |
|
|
|----------|-------------|-----|-----------|---------|
|
|
| macOS | Mach VM | Unix socket | sudo + codesign | ~/Library/Containers/... |
|
|
| Linux | /proc/pid/mem | Unix socket | sudo | ~/Documents/xwechat_files |
|
|
| Windows | ToolHelp + ReadProcessMemory | Named pipe | Admin | %APPDATA%/Tencent/xwechat |
|
|
|
|
---
|
|
|
|
## Testing Coverage
|
|
|
|
- `src/crypto/mod.rs`: hex encoding, salt reading, recursive collection
|
|
- `src/scanner/macos.rs`: pattern matching (uppercase, dedup, embedded, edge cases)
|
|
- Unit tests for helper functions; integration tests would require live WeChat
|
|
|
|
---
|
|
|
|
## Extension Points
|
|
|
|
1. **New commands**: Add to `cli/mod.rs` enum + dispatch + query.rs function
|
|
2. **New message types**: Extend `fmt_type()` + `fmt_content()` parsers
|
|
3. **New DB sources**: Add to DbCache key list + query functions
|
|
4. **Output formats**: Extend `output.rs` formatter
|
|
|
|
---
|
|
|
|
## File Structure Summary
|
|
|
|
```
|
|
src/
|
|
├── main.rs # Entry point (daemon/CLI switch)
|
|
├── config.rs # Config loading + auto-detect
|
|
├── ipc.rs # Request/Response protocol types
|
|
├── cli/
|
|
│ ├── mod.rs # clap command definitions + dispatch
|
|
│ ├── transport.rs # IPC client + daemon lifecycle
|
|
│ ├── output.rs # YAML/JSON formatting
|
|
│ ├── init.rs # wx init implementation
|
|
│ ├── sessions.rs # etc. (thin wrappers around IPC)
|
|
│ └── daemon_cmd.rs # daemon status/stop/logs
|
|
├── daemon/
|
|
│ ├── mod.rs # daemon entry + async_run
|
|
│ ├── cache.rs # DbCache (mtime-aware decryption cache)
|
|
│ ├── server.rs # IPC server (Unix/Windows)
|
|
│ └── query.rs # All query implementations
|
|
├── crypto/
|
|
│ ├── mod.rs # SQLCipher page decryption
|
|
│ └── wal.rs # WAL application
|
|
└── scanner/
|
|
├── mod.rs # common interface + salt collection
|
|
├── macos.rs # Mach VM memory scanner
|
|
├── linux.rs # /proc scanner
|
|
└── windows.rs # Windows API scanner
|
|
```
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
**Core crates:**
|
|
- `clap` (derive) - CLI parsing
|
|
- `tokio` (full) - async runtime
|
|
- `serde`/`serde_json` - serialization
|
|
- `rusqlite` (bundled) - SQLite queries
|
|
- `aes`/`cbc`/`hmac`/`sha2`/`pbkdf2` - crypto primitives
|
|
- `zstd` - message decompression
|
|
- `chrono` - timestamp formatting
|
|
- `anyhow` - error handling
|
|
- `dirs` - home directory
|
|
- `md5` - table name hashing
|
|
- `regex` - Msg_<md5> pattern matching
|
|
|
|
**Platform-specific:**
|
|
- Unix: `libc` (setsid, signal handling)
|
|
- Windows: `windows` crate (process/memory APIs), `interprocess` (named pipes)
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
wx-cli is a well-architected Rust project demonstrating:
|
|
- Clean separation of CLI/daemon/crypto/scanner layers
|
|
- Async-first daemon with sync-offload for SQLite
|
|
- Smart caching strategy (mtime-based invalidation)
|
|
- Cross-platform memory scanning for SQLCipher key extraction
|
|
- AI-friendly output design (YAML default, JSON optional)
|
|
- Comprehensive command coverage for WeChat local data |