Compare commits

...

25 Commits

Author SHA1 Message Date
jakevin 08af894594
fix(biz-articles): read all biz_message shards (#81) 2026-05-19 14:19:02 +08:00
jakevin 94fcc36ffe
feat(attachments): expose stable group sender identity (#77)
`q_attachments` 群聊场景下两个昵称同名的成员,原本只输出
`sender` 字段(取群名片),无法在 JSON 消费侧区分谁发的图。

跟 #68 把 `sender_username / sender_contact_display /
sender_group_nickname` 一起追加到 attachment row 上,复用
PR68 引入的 `add_sender_identity` / `sender_username` helper,
保持 4 处出口 (history / search / new-messages / stats.top_senders)
+ attachments 的字段语义完全一致。

调整:
- `q_attachments` 元组从 7 字段扩到 8 字段(多带一个稳定 wxid)
- spawn_blocking 内部多算一次 `sender_username`,per-row 复杂度 O(1)
- JSON build 处调用 `add_sender_identity`,行为对齐:非群 / 解析不到
  wxid 时三字段不输出

测试 / 文档:
- 新增 `attachment_row_gets_stable_group_sender_identity_via_helper`,
  锁住"两同名成员可被 sender_username 区分" + "非群 / 未知 sender
  不追加伪字段"
- README + SKILL.md 在 `attachments` 段和顶部 "sender 选择策略" 段
  同时记录新字段,标明 wxid 解析不到时的不输出语义

closes #23
2026-05-19 01:44:03 +08:00
jackwener 0612789d19 Merge pull request #68 from t0m1sacat/kael/sender-identity
fix: expose stable group sender identity
2026-05-19 01:14:58 +08:00
jackwener f8550ae74d Merge pull request #63 from Icy-Cat/feat/windows-mydocument-keyword
feat(windows): resolve MyDocument: token in Weixin data-root ini
2026-05-19 01:14:58 +08:00
jackwener 5f87ce6348 Merge pull request #62 from Icy-Cat/fix/init-error-shows-config-path
fix(init): show config.json path in auto-detect error
2026-05-19 01:14:58 +08:00
jackwener ed95812332 Merge pull request #76 from Suda202/fix/group-nickname-field
fix(members): ignore non-card fields for group nicknames
2026-05-19 01:14:58 +08:00
suda be1a174226 fix(members): ignore non-card fields for group nicknames 2026-05-18 23:18:20 +08:00
kael c34f5f8fe2 fix: expose stable group sender identity 2026-05-16 08:46:37 +08:00
jackwener 739b66a4b1 chore(release): bump version to 0.3.0 2026-05-16 02:23:46 +08:00
jackwener b5edaf7177 feat(meta): expose freshness coverage in query output 2026-05-16 02:22:03 +08:00
jackwener 9f6a2cfba3 review: restore cache mode coverage and rationale comments 2026-05-15 22:33:32 +08:00
jackwener 76024901e9 feat(meta): expose freshness coverage in query output 2026-05-15 22:08:46 +08:00
jakevin 12740afb53
docs(macos): document codesign side-effect popup (#64)
* docs(macos): document codesign side-effect popup ("微信" 想访问其他 App 的数据)

After `codesign --force --deep --sign - /Applications/WeChat.app`, macOS
treats the re-signed WeChat as a different code identity from the
original. When WeChat then accesses its own container / cache / app-group
data (notably triggered when opening 公众号 articles), macOS fires the
"'微信' 想访问其他 App 的数据" popup.

This is a known side-effect of the current macOS invasive init path,
not a "wx-cli is reading other apps' data" issue and not a 公众号-only
problem — 公众号 is just a high-frequency trigger surface because of
WebView / cache access.

Document this in 3 places per agreed scope:
- README.md macOS init: add "副作用提示" callout linking to the guide
- docs/macos-permission-guide.md: new §六 with first-principles
  explanation, mitigation options, and long-term direction
- src/cli/init.rs: print a short macOS-only warning at the end of
  `wx init` so users see it right when they finish the invasive setup

* review: stop overstating the trade-off and condition the init warning

Per codex review on PR #64:

1. src/cli/init.rs warning was unconditional but the wording presumed
   the user had taken the ad-hoc re-sign path. If init goes through the
   tier 2 path (Apple-signed WeChat + GUI Terminal + Developer Tools TCC
   authorization), the warning would mis-fire. Reword conditionally and
   point to the GitHub URL of the doc instead of a relative path that
   release-binary / npm-installed users won't have on disk.

2. docs/macos-permission-guide.md §六 and the matching README callout
   said "restoring official WeChat = giving up macOS memory-scan". This
   contradicts the same guide's §一 实测表 which shows
   "Apple 签名 + 本机 Terminal sudo = ". Restoring the official
   signature only gives up the default re-sign path; the local-Terminal
   + Developer-Tools route still works on Apple-signed WeChat. Only
   SSH + Apple-signed WeChat actually requires re-signing.

* review (round 2): caveat empirical gap + drop emoji

Self-review found two issues both LGTMs missed:

1. The "tier 2 仍走通" claim (README + §六) leans on §一 实测表 row
   "Apple 签名 + 本机 Terminal sudo = ". But that data only covers
   macOS 10.15 (Catalina) and 11.1 (Big Sur). macOS 14/15 — the exact
   versions where the popup behavior originates — were never tested
   for that path in this project. Add an explicit caveat instead of
   silently extrapolating across major macOS versions.

2. `init.rs` warning used a ⚠️ emoji prefix, which violates the
   project + global "no emojis in files unless requested" rule. README
   and the rest of init.rs have no emoji. Replace with `[macOS]`.
2026-05-15 15:47:15 +08:00
Icy-Cat b58ae5468d feat(windows): resolve MyDocument: token in Weixin data-root ini
The data-root ini under %APPDATA%\Tencent\xwechat\config\*.ini is
observed to contain either a plain absolute path (e.g. D:\WeChatFiles)
or the literal token 'MyDocument:'. The token form is not a real
filesystem path, so detect_db_dir_impl() — which previously did
PathBuf::from(content).is_dir() — silently failed on it, even though
the user's Weixin data was sitting in their (possibly relocated)
Documents folder.

Empirically the token denotes 'the calling user's Documents folder'.
We resolve it via SHGetKnownFolderPath(FOLDERID_Documents), which
honours the standard Windows shell-folder redirect (HKCU User Shell
Folders\Personal), so users who moved Documents to e.g. D:\Documents
now auto-detect correctly.

Plain absolute paths still pass through unchanged.

Adds Win32_UI_Shell + Win32_System_Com features to the windows crate
(needed for SHGetKnownFolderPath and CoTaskMemFree).
2026-05-15 11:53:35 +08:00
Icy-Cat 7451ce5684 fix(init): show config.json path in auto-detect error
When auto_detect_db_dir() fails, the error told the user to edit
config.json without saying where that file lives. On Windows that is
%USERPROFILE%\.wx-cli\config.json, which is non-obvious.

Use the config_path already computed at the top of cmd_init() so the
error message includes the absolute path, plus a concrete example of
the db_dir shape.
2026-05-15 11:49:40 +08:00
jackwener 52cc39a55c chore(release): bump version to 0.2.0
主要新增:
- `wx attachments` / `wx extract`:从本地 chat 数据解密提取 V2 图片附件(macOS / Windows)
- `DbCache` WAL 增量复用:daemon 请求路径从每次 ~120s 全量解密压到 < 1s(典型 WAL)

完整 changelog 见 #57 / #58。
2026-05-14 21:38:05 +08:00
jackwener 6424a2162b fix(cache): reuse decrypted db across wal-only updates (#58) 2026-05-14 19:37:22 +08:00
jackwener e9f65ba71b review: preserve wal incremental reuse across restart 2026-05-14 19:35:36 +08:00
jackwener b032b8be04 fix(cache): apply WAL incrementally instead of full re-decrypting on WAL mtime change
DbCache 之前只要 .db 或 .db-wal 任一 mtime 变就 full_decrypt。WeChat 在写消息
时会持续 append WAL(无 checkpoint 时),导致每次 attachments/extract 请求都
重新解密 1.8GB 的 message_0.db(实测 ~120s/次)。

改成三种 hit 路径:
  1. db_mt + wal_mt 都不变 → 直接返回 cached path
  2. db_mt 不变、wal_mt 变了 → 在 cached 产物上**再 apply 一次 WAL**
     (apply_wal 是幂等的:旧帧 redo 同样的 page 写入,新帧追加生效)
  3. db_mt 变了 → 全量解密 + apply WAL(旧路径)

效果:典型 WAL(< 10MB)从 ~120s 压到 < 1s;100MB 大 WAL 也只在 ~7s。
SQLite 不会自发"主库不变 + WAL 清空",所以 path 2 的边角不需要特殊处理。

测试覆盖三条路径:
  - exact_mtime_hit_skips_decrypt
  - wal_only_change_uses_incremental_path
  - db_mtime_change_triggers_full_decrypt
区分手段:cached file 大小是否被 full_decrypt 重写到 PAGE_SZ 倍数。
2026-05-14 19:24:02 +08:00
jackwener ff96f957b7 feat(attachment): support image extraction from local chat data (#57) 2026-05-14 19:11:13 +08:00
jackwener b63589b368 review: tighten attachment extraction scope 2026-05-14 19:10:03 +08:00
jackwener 7feacc6371 fix(daemon): drop redundant `ok` from extract payload (collides with Response.ok)
Response 用 #[serde(flatten)] 把 q_* 返回的 Value 拼到 `{ok, error, ...data}`
里,q_extract 里再塞一个 `"ok": true` 就会在 wire 上写出两个同名 key,CLI
端 `serde_json::from_str::<Response>` 直接报「duplicate field `ok`」,对外
表现是「extract 失败 / 解析 daemon 响应失败」,但 daemon 实际已经把图解出来
了。其他 q_* 都没塞 ok(biz_articles / sessions / history 等),保持一致。
2026-05-14 18:48:46 +08:00
jackwener 2d88c9542d feat(attachment): wire wx attachments / wx extract end-to-end
把 V1 (legacy XOR + V1 fixed-AES) + 平台相关 V2 (macOS / Windows) image 解
密能力一路接到 CLI:

- ipc: 新增 Attachments / Extract 两个 Request variant
- daemon/server: dispatch 路由到 query::q_attachments / q_extract
- daemon/cache: DbCache::db_dir() 公开,让 resolver 推 wxchat_base
- daemon/query: q_attachments 走 Msg_<chat> 表按 (local_type & 0xFFFFFFFF)
  IN (...) 过滤、按 ts DESC 全局排序后分页,返回不透明 attachment_id;
  q_extract 解码 attachment_id → 查 message_resource.db → 找本地 .dat →
  按 magic 分发 v1/v2 解码 → 写盘。bridge 用 ImageKeyMaterial.{aes_key,
  xor_key}(codex 实测真实账号 xor_key=0xa2,不能硬编码 0x88)
- cli: 新增 wx attachments / wx extract 两个子命令,flag 风格与现有
  history / biz-articles 对齐
- README + SKILL: 加附件提取章节,含三档解码档位与 V2 image key 派生说明
2026-05-14 18:40:57 +08:00
jackwener bf8d0d934a feat(attachment): implement V2 image key providers 2026-05-14 18:34:38 +08:00
jackwener 14fdfde1d3 feat(attachment): scaffold module + V1 decoders + resource resolver
Lays down the skeleton for聊天附件 (chat attachment) extraction. This commit
introduces the `attachment` module with:

- `attachment_id`: opaque base64url(json) round-trip handle for CLI/IPC. Carries
  `(chat, local_id, create_time, kind)` — `local_id` alone is not unique
  (实测同 chat 内最多 7 条同 local_id 的记录), so create_time is required for
  disambiguation.
- `decoder/`: dispatch by 6B header magic. Three branches:
  - `V2_MAGIC` → AES-128-ECB + raw + XOR (need image AES key)
  - `V1_MAGIC` → AES-128-ECB with fixed key `cfcd208495d565ef` (= md5("0")[:16])
  - else → legacy single-byte XOR with magic auto-detect
  Manual ECB + PKCS7 unpad to avoid pulling in another crate.
- `resolver`: `message_resource.db` lookup chain
  `username → ChatName2Id.rowid → MessageResourceInfo.packed_info → md5`
  + on-disk `.dat` selection (full > _h > _t) under
  `<wxchat_base>/msg/attach/<md5(chat)>/<YYYY-MM>/Img/<md5>[_t|_h].dat`.
  Honors `message_local_type % 2^32` to strip the high flag bits, and orders by
  `message_create_time DESC` to handle local_id reuse.
- `image_key/`: stub trait + macOS / Windows placeholders. To be filled by
  codex with the V2 image key extraction (kvcomm + brute-force on macOS, memory
  scan on Windows).

V1 decoder ships with 6 unit tests covering every supported magic + the BMP
extra validation; resolver ships with packed_info parser + dat-file selection
tests; v2 decoder ships with header validation tests. 21 tests pass.

`cargo check` and `cargo check --target x86_64-pc-windows-gnu` both clean.
2026-05-14 18:25:32 +08:00
41 changed files with 5945 additions and 858 deletions

9
Cargo.lock generated
View File

@ -105,6 +105,12 @@ version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8" checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8"
[[package]]
name = "base64"
version = "0.22.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72b3254f16251a8381aa12e40e3c4d2f0199f8c6508fbecb9d91f575e0fbb8c6"
[[package]] [[package]]
name = "bitflags" name = "bitflags"
version = "2.11.1" version = "2.11.1"
@ -1307,10 +1313,11 @@ checksum = "d7249219f66ced02969388cf2bb044a09756a083d0fab1e566056b04d9fbcaa5"
[[package]] [[package]]
name = "wx-cli" name = "wx-cli"
version = "0.1.11" version = "0.3.0"
dependencies = [ dependencies = [
"aes", "aes",
"anyhow", "anyhow",
"base64",
"cbc", "cbc",
"chrono", "chrono",
"clap", "clap",

View File

@ -1,6 +1,6 @@
[package] [package]
name = "wx-cli" name = "wx-cli"
version = "0.1.11" version = "0.3.0"
edition = "2021" edition = "2021"
description = "WeChat 4.x (macOS/Linux) local data CLI — decrypt SQLCipher DBs, query chat history, watch new messages" description = "WeChat 4.x (macOS/Linux) local data CLI — decrypt SQLCipher DBs, query chat history, watch new messages"
license = "Apache-2.0" license = "Apache-2.0"
@ -50,6 +50,9 @@ dirs = "5"
# MD5 (联系人表名 Msg_<md5>) # MD5 (联系人表名 Msg_<md5>)
md5 = "0.7" md5 = "0.7"
# 附件 ID 编码base64url
base64 = "0.22"
# 正则表达式 # 正则表达式
regex = "1" regex = "1"
roxmltree = "0.20" roxmltree = "0.20"
@ -68,6 +71,8 @@ windows = { version = "0.58", features = [
"Win32_System_Threading", "Win32_System_Threading",
"Win32_Foundation", "Win32_Foundation",
"Win32_System_Memory", "Win32_System_Memory",
"Win32_System_Com",
"Win32_UI_Shell",
] } ] }
[profile.release] [profile.release]

View File

@ -36,7 +36,7 @@ npx skills add jackwener/wx-cli -g
- **零依赖安装** — 单一 Rust 二进制,一行命令装完 - **零依赖安装** — 单一 Rust 二进制,一行命令装完
- **毫秒级响应** — 后台 daemon 持久缓存解密数据库mtime 不变则复用 - **毫秒级响应** — 后台 daemon 持久缓存解密数据库mtime 不变则复用
- **AI 友好**默认 YAML 输出,更省 token & 易读;`--json` 可切换为 JSON方便 `jq` 处理等) - **AI 友好**`history` / `search` / `sessions` / `new-messages` / `stats` / `attachments` 默认返回 `{..., meta}` wrapperagent 能直接消费 freshness / source 信息
- **完全本地** — 数据不出本机,实时解密,无需全量预解密 - **完全本地** — 数据不出本机,实时解密,无需全量预解密
--- ---
@ -121,6 +121,8 @@ sudo wx init
> >
> 重签名后 macOS 的 TCC 隐私授权按新 code signature 重新校验,旧记录会失效。如果跳过 `tccutil reset`,微信截图/视频通话/麦克风等权限可能"看起来已开启但实际拒绝"。详见 [macOS 权限与签名指南](docs/macos-permission-guide.md#五重签名后微信权限-silent-失效)。 > 重签名后 macOS 的 TCC 隐私授权按新 code signature 重新校验,旧记录会失效。如果跳过 `tccutil reset`,微信截图/视频通话/麦克风等权限可能"看起来已开启但实际拒绝"。详见 [macOS 权限与签名指南](docs/macos-permission-guide.md#五重签名后微信权限-silent-失效)。
> **副作用提示**:完成上面的 ad-hoc 重签后macOS 会比较频繁地弹 `"微信" 想访问其他 App 的数据`(在微信里打开公众号文章时尤其容易触发)。这是当前 macOS invasive init 路径的已知副作用:重签后 WeChat 的 code identity 变了,它再访问自己原来的 container / 缓存数据会被系统识别为"跨 App 访问"。点"允许"通常只是放行当前 WeChat 进程;想彻底不弹得恢复官方 WeChat——这只放弃**当前依赖重签的默认路径****不等于放弃 memory-scan**:在本机 GUI Terminal 下、Terminal.app 拿到「开发者工具」TCC 授权后,对 Apple 官方签名的 WeChat 应当仍可以走通(实证覆盖只有 Catalina / Big SurmacOS 14+ 未在本项目内实测);只有 SSH 远程 + Apple 签名 WeChat 这种组合才必须重签。详见 [macOS 权限与签名指南 §六](docs/macos-permission-guide.md#六微信-想访问其他-app-的数据-弹窗)。
**Linux** **Linux**
```bash ```bash
@ -166,6 +168,23 @@ wx search "会议" --in "工作群" --since 2026-01-01
群聊里的 `last_sender`、`sender` 和 `stats``top_senders` 会优先使用群昵称(群名片)。如果本地数据库里没有对应群昵称,则回退到联系人备注、微信昵称或 username。 群聊里的 `last_sender`、`sender` 和 `stats``top_senders` 会优先使用群昵称(群名片)。如果本地数据库里没有对应群昵称,则回退到联系人备注、微信昵称或 username。
`history` / `search` / `new-messages` / `attachments` 以及 `stats.top_senders`,在群聊上下文里还会附带稳定身份三件套:
- `sender_username`:稳定 wxid用来区分两个昵称同名的成员
- `sender_contact_display`:通讯录里的显示名(备注 > 昵称 > wxid 兜底)
- `sender_group_nickname`:群名片本身(同 `sender` 的来源,方便机器读取时不必再解析)
解析不到 wxid 时id2u 没命中且老格式 `wxid_xxx:\n...` 前缀也不存在)这三字段不会输出,避免伪造空字段污染下游过滤。
`history` / `search` / `sessions` / `unread` / `new-messages` / `stats` / `attachments` 现在都会附带 `meta`
- `status`: `ok` / `possibly_stale` / `possibly_stale_unknown_shards` / `windowed`
- `unknown_shards`: 磁盘上存在、但 daemon 当前没有 key 的 `message_N.db` 分片;非空时应先跑 `wx init --force`
- `chat_latest_timestamp` / `chat_latest_db`: 当前命中数据里最新一条消息的时间和分片来源
- `session_last_timestamp`: `session.db` 里 WeChat 自己记录的最新时间;如果明显领先于 `chat_latest_timestamp`,说明结果可能漏了消息
默认情况下,人类用户会在 stderr 看到可执行的 warningagent / 脚本可直接读 stdout 里的 `meta`。传 `--with-meta` 会额外返回 `per_shard_latest` / `cache_mode_per_shard`,传隐藏 flag `--debug-source` 还会带真实 `shard_paths`
引用消息会在 `history` / `search` / `new-messages` 输出中显示当前回复和被引用原文: 引用消息会在 `history` / `search` / `new-messages` 输出中显示当前回复和被引用原文:
```text ```text
@ -198,7 +217,7 @@ wx sns-search "婚礼" --user "李四" --since 2023-01-01
### 公众号文章 ### 公众号文章
公众号文章推送存在独立的 `biz_message_0.db`,用 `biz-articles` 单独查: 公众号文章推送存在独立的 `biz_message_*.db` 分片,用 `biz-articles` 单独查:
```bash ```bash
wx biz-articles # 最近 50 篇 wx biz-articles # 最近 50 篇
@ -211,6 +230,35 @@ wx biz-articles --json | jq '.[].url' # 下游消费 URL
每条返回:`account` / `account_username` / `title` / `url` / `digest` / `cover_url` / `time` / `timestamp` / `recv_time_str`。多图文推送会展开成多行。 每条返回:`account` / `account_username` / `title` / `url` / `digest` / `cover_url` / `time` / `timestamp` / `recv_time_str`。多图文推送会展开成多行。
### 附件提取(图片)
聊天里的附件本体存在 `xwechat_files/<wxid>/msg/attach/...` 下的 `.dat` 文件,需要按消息所在 `message_resource.db` 的 md5 + 平台相关 image key 解码才能拿到原图。
```bash
# 1) 列出会话里的图片附件,先拿到不透明的 attachment_id
wx attachments "张三"
wx attachments "AI群" --kind image -n 100
wx attachments "AI群" --since 2026-04-01 --until 2026-04-15
# 2) 把单个 attachment_id 解密写出去(扩展名建议保留 .jpg / .mp4 等)
wx extract <attachment_id> -o ~/Desktop/photo.jpg
wx extract <attachment_id> -o /tmp/x.jpg --overwrite
```
`attachments` 输出每条带:`attachment_id` / `kind` / `type` / `local_id` / `timestamp` / `time`,群聊里还有 `sender` 以及稳定身份三件套 `sender_username` / `sender_contact_display` / `sender_group_nickname`(语义同 `history` / `search` / `new-messages``sender_username` 是 wxid用于两个同名成员之间的稳定区分解析不到 wxid 时这三字段不输出)。当前 `kind` 固定为 `image`;命令名保留成 `attachments` 是为了后续扩到其他附件类型时不 break CLI。
`extract` 输出报告里带:`md5` / `dat_path` / `dat_size` / `output` / `output_size` / `format`实际识别出的图片格式jpg / png / gif / webp / hevc 等)/ `decoder`(实际选用的解码器:`legacy_xor` / `v1_aes` / `v2`)。
支持的解码档位:
- **legacy XOR**:早期单字节 XOR无 magic按文件首字节探测格式自动反推
- **V1 fixed-AES**`07 08 V1 08 07`AES-128-ECB + 固定 key `cfcd208495d565ef`
- **V2 AES + XOR**`07 08 V2 08 07`AES-128-ECB + raw + XORAES key 平台派生
V2 image key 提取:
- **macOS**`kvcomm` cache`key_<uin>_*.statistic` 文件名取 uin → `md5(str(uin) + wxid)[:16]`+ brute-force fallback`md5(str(uin))[:4] == wxid_suffix` 枚举 2^24xor_key = `uin & 0xff`**不是硬编码 0x88**
- **Windows**:扫 `Weixin.exe` 内存匹配 `[A-Za-z0-9]{32|16}` 候选,按 V2 template ciphertext-block 反验
- **Linux**:上游空白,遇到 V2 .dat 会报 unsupported
### 联系人 & 群组 ### 联系人 & 群组
```bash ```bash
@ -247,12 +295,14 @@ wx export "AI群" --since 2026-01-01 --format json
### 输出格式 ### 输出格式
默认输出 YAML,更省 token & 易读;`--json` 可切换为 JSON方便 `jq` 处理等) 默认输出 YAML`--json` 可切换为 JSON。对 agent 而言,`history` / `search` / `sessions` / `new-messages` / `stats` / `attachments` 的 stdout 现在是 wrapper而不是裸数组
```bash ```bash
wx sessions --json wx sessions --json
wx search "关键词" --json | jq '.[0].content' wx search "关键词" --json | jq '.results[0].content'
wx new-messages --json wx new-messages --json
wx history "张三" --json | jq '.meta'
wx history "张三" --json --with-meta | jq '.meta.cache_mode_per_shard'
``` ```
### Daemon 管理 ### Daemon 管理

View File

@ -159,6 +159,31 @@ wx search "会议" --in "工作群" --since 2026-01-01
群聊消息里的 `last_sender`、`sender` 和 `stats.top_senders` 会优先显示群昵称(群名片)。如果本地数据库没有群昵称,再回退到联系人备注、微信昵称或 username。 群聊消息里的 `last_sender`、`sender` 和 `stats.top_senders` 会优先显示群昵称(群名片)。如果本地数据库没有群昵称,再回退到联系人备注、微信昵称或 username。
`history` / `search` / `new-messages` / `attachments``stats.top_senders` 在群上下文里同时输出稳定身份三件套:`sender_username`(稳定 wxid用来区分同名成员/ `sender_contact_display`(备注 > 昵称 > wxid 兜底)/ `sender_group_nickname`(群名片,等价于 `sender` 的来源,免去再做字符串解析)。当 wxid 解析不到时,这三字段不会输出,避免空字符串污染下游过滤。
`sessions` / `unread` / `history` / `search` / `new-messages` / `stats` / `attachments` 的 stdout 现在统一是 wrapper
```json
{
"messages": [...],
"meta": {
"status": "ok",
"unknown_shards": [],
"chat_latest_timestamp": 1715750400,
"chat_latest_db": "message/message_2.db",
"session_last_timestamp": 1715760000
}
}
```
其中:
- `status = possibly_stale_unknown_shards`:磁盘上出现 daemon 不认识的新 `message_N.db`,先跑 `wx init --force`
- `status = possibly_stale``session.db` 记录的最新时间明显领先于本次查到的最新消息,结果可能漏消息
- `status = windowed`:这次查询本来就是窗口化/过滤后的局部视图,不应把它当作"全量最新状态"
- `--with-meta`:额外返回 `per_shard_latest` / `cache_mode_per_shard`
- `--debug-source`:在 `--with-meta` 基础上再暴露真实 `shard_paths`
引用消息appmsg `type=57`)在 `history` / `search` / `new-messages` 输出里会展开为两行:第一行是当前回复,第二行以 `↳` 开头显示被引用原文,例如: 引用消息appmsg `type=57`)在 `history` / `search` / `new-messages` 输出里会展开为两行:第一行是当前回复,第二行以 `↳` 开头显示被引用原文,例如:
```text ```text
@ -217,7 +242,7 @@ wx sns-search "婚礼" --user "李四" --since 2023-01-01 -n 50
### 公众号文章 ### 公众号文章
公众号的文章推送存在独立的 `biz_message_0.db`,与普通 `message_0.db` 分开: 公众号的文章推送存在独立的 `biz_message_*.db` 分片,与普通 `message_0.db` 分开:
```bash ```bash
# 最近 50 篇(默认) # 最近 50 篇(默认)
@ -242,6 +267,34 @@ wx biz-articles --since 2026-05-10 --json | jq '.[].url'
每条返回的字段:`account` / `account_username``gh_*`/ `title` / `url``mp.weixin.qq.com` 链接)/ `digest` / `cover_url` / `time` + `timestamp`(文章发布时间)/ `recv_time_str` + `recv_time`(微信接收推送的时间)。多图文推送会展开为多行。 每条返回的字段:`account` / `account_username``gh_*`/ `title` / `url``mp.weixin.qq.com` 链接)/ `digest` / `cover_url` / `time` + `timestamp`(文章发布时间)/ `recv_time_str` + `recv_time`(微信接收推送的时间)。多图文推送会展开为多行。
### 附件提取(图片)
聊天里的图片本体在 `xwechat_files/<wxid>/msg/attach/...` 下加密存储(`.dat`),需要按消息所在 `message_resource.db` 的 md5 + 平台相关 image key 才能解码。两步走:
```bash
# 1) 先列出图片附件,拿到不透明的 attachment_id
wx attachments "张三"
wx attachments "AI群" --kind image -n 100
wx attachments "AI群" --since 2026-04-01 --until 2026-04-15
# 2) 用 attachment_id 把单个资源解密写到指定路径
wx extract <attachment_id> -o ~/Desktop/photo.jpg
wx extract <attachment_id> -o /tmp/x.jpg --overwrite
```
`attachments` 输出每条带:`attachment_id` / `kind`(当前固定 `image`/ `type` / `local_id` / `timestamp` / `time`,群聊里另带 `sender` 和稳定身份三件套(同上文)。命令名保留成 `attachments` 是为了后续扩到其他附件类型时不 break CLI。
`extract` 报告里带:`md5` / `dat_path` / `dat_size` / `output` / `output_size` / `format`实际识别出的图片格式jpg / png / gif / webp / hevc 等)/ `decoder`(实际选用的解码器:`legacy_xor` / `v1_aes` / `v2`)。
支持的解码档位:
- **legacy XOR**:早期单字节 XOR无 magic按文件首字节探测格式自动反推
- **V1 fixed-AES**`07 08 V1 08 07`AES-128-ECB + 固定 key `cfcd208495d565ef`
- **V2 AES + XOR**`07 08 V2 08 07`AES-128-ECB + raw + XORAES key 平台派生
V2 image key 提取macOS / Windows 自动Linux 暂不支持):
- macOS`kvcomm` cache`key_<uin>_*.statistic` 文件名取 uin → `md5(str(uin) + wxid)[:16]`+ brute-force fallback`xor_key = uin & 0xff`
- Windows`Weixin.exe` 内存匹配 `[A-Za-z0-9]{32|16}` 候选,按 V2 template ciphertext-block 反验
### 收藏与统计 ### 收藏与统计
```bash ```bash
@ -287,8 +340,10 @@ wx daemon logs --follow
```bash ```bash
wx sessions --json wx sessions --json
wx new-messages --json wx new-messages --json
wx search "关键词" --json wx search "关键词" --json | jq '.results[0]'
wx history "张三" --json -n 50 wx history "张三" --json -n 50 | jq '.messages[0]'
wx history "张三" --json | jq '.meta'
wx history "张三" --json --with-meta | jq '.meta.cache_mode_per_shard'
``` ```
CHAT 参数支持昵称、备注名、微信 ID模糊匹配。不确定准确名称时先用 `wx contacts --query` 搜索。 CHAT 参数支持昵称、备注名、微信 ID模糊匹配。不确定准确名称时先用 `wx contacts --query` 搜索。

View File

@ -272,3 +272,50 @@ TeamIdentifier=not set
``` ```
最直接的功能验证:在微信里使用截图、视频通话、麦克风等功能,按 GUI 弹窗的"允许"重新授权一次,之后正常工作。 最直接的功能验证:在微信里使用截图、视频通话、麦克风等功能,按 GUI 弹窗的"允许"重新授权一次,之后正常工作。
---
## 六、`"微信" 想访问其他 App 的数据` 弹窗
### 现象
执行过 `wx init`、对 `/Applications/WeChat.app` 做过 ad-hoc 重签名之后,再使用微信时会比较频繁地看到 macOS 弹出:
```
"微信" 想访问其他 App 的数据。
单独存放 App 数据可让你更容易管理隐私和安全。
[ 不允许 ] [ 允许 ]
```
最常见的触发面是**在微信里打开公众号文章**,但这只是高频触发面,不是根因。
### 根因(第一性原理)
这弹窗是 macOS Ventura+ / 14 / 15 对 **app data container 跨身份访问** 的保护:当前进程("微信")正在读取另一个 code identity 的 app 留下的数据。
我们当前 macOS 方案为了让 `task_for_pid` 能拿到 WeChat 的 task port、读取进程内存里的 raw key要求用户执行
```bash
codesign --force --deep --sign - /Applications/WeChat.app
```
这一步把 WeChat 从 Apple 官方签名换成 ad-hoc 身份。对用户来说它仍然是"微信";对 macOS 安全模型来说,**重签前的 WeChat** 和 **重签后的 WeChat** 已经不是同一个 app identity。
之后当(重签后的)微信访问它原本的 `~/Library/Containers/com.tencent.xinWeChat/...`、缓存、app group 等数据时,系统看到的是"一个新身份在读旧身份留下的 container 数据",于是按隐私保护策略弹这个对话框。公众号文章里的 webview / cookie / 缓存路径刚好踩到了这条访问路径,所以"打开公众号就弹"会非常容易复现,但**本质不是公众号页面的问题**,而是 code identity + container access。
> 注意:这**不是** "wx-cli 在偷偷读别的 App 的数据"wx-cli 进程本身对 WeChat container 是只读访问;但**要求用户重签 WeChat** 这一步本身就是这类弹窗的直接诱因。所以这是当前 macOS invasive init 路径的已知副作用,不是与 wx-cli 无关的系统行为。
### 应对
短期缓解:
- 点"允许"通常只是放行**当前这次** WeChat 进程;下一次 WeChat 启动权限会 reset可能还会再弹
- 该授权一般不会在 System Settings 里留下显式开关,因为它绑定的是动态的 code identity
彻底不弹:
- 把 `/Applications/WeChat.app` 恢复成官方签名(重装官方 WeChat 包),不再执行 `codesign --force --deep --sign -`
- 这一步只是放弃**当前依赖 ad-hoc 重签的默认路径**,并不等于放弃 macOS memory-scan在本机 GUI Terminal 下、对 Terminal.app 授予「开发者工具」TCC 权限后,`task_for_pid` 对 Apple 官方签名hardened runtime的 WeChat 应当仍能走通——参考 §一 实测表里的"Apple 签名 + 本机 Terminal sudo = ✅"
- ⚠️ 实测覆盖范围说明:§一 实测表里 "Apple 签名 + 本机 Terminal sudo ✅" 的两条实证只覆盖 macOS 10.15 (Catalina) 与 11.1 (Big Sur)macOS 14 (Sonoma) / 15 (Sequoia) 上是否仍走通**未在本项目内实测**。如果你按这条路恢复官方签名后发现 init 走不通,请回到重签路径并接受本节描述的弹窗副作用
- 真正受限的场景是 SSH 远程 + Apple 签名 WeChat`sshd` 拿不到 TCC 开发者工具授权,这时才必须走重签路径
长期方向:
- 这条副作用的真正修复是把 `wx init` 重新设计成 `safe → assisted → invasive fallback` 三层:默认不动 WeChat只有在前两条都不可行时才走 ad-hoc 重签,并先打出完整副作用清单让用户显式确认。在那之前,这是已知 trade-off。

View File

@ -1,6 +1,6 @@
{ {
"name": "@jackwener/wx-cli-darwin-arm64", "name": "@jackwener/wx-cli-darwin-arm64",
"version": "0.1.11", "version": "0.3.0",
"description": "wx-cli binary for macOS arm64", "description": "wx-cli binary for macOS arm64",
"os": ["darwin"], "os": ["darwin"],
"cpu": ["arm64"], "cpu": ["arm64"],

View File

@ -1,6 +1,6 @@
{ {
"name": "@jackwener/wx-cli-darwin-x64", "name": "@jackwener/wx-cli-darwin-x64",
"version": "0.1.11", "version": "0.3.0",
"description": "wx-cli binary for macOS x64", "description": "wx-cli binary for macOS x64",
"os": ["darwin"], "os": ["darwin"],
"cpu": ["x64"], "cpu": ["x64"],

View File

@ -1,6 +1,6 @@
{ {
"name": "@jackwener/wx-cli-linux-arm64", "name": "@jackwener/wx-cli-linux-arm64",
"version": "0.1.11", "version": "0.3.0",
"description": "wx-cli binary for Linux arm64", "description": "wx-cli binary for Linux arm64",
"os": ["linux"], "os": ["linux"],
"cpu": ["arm64"], "cpu": ["arm64"],

View File

@ -1,6 +1,6 @@
{ {
"name": "@jackwener/wx-cli-linux-x64", "name": "@jackwener/wx-cli-linux-x64",
"version": "0.1.11", "version": "0.3.0",
"description": "wx-cli binary for Linux x64", "description": "wx-cli binary for Linux x64",
"os": ["linux"], "os": ["linux"],
"cpu": ["x64"], "cpu": ["x64"],

View File

@ -1,6 +1,6 @@
{ {
"name": "@jackwener/wx-cli-win32-x64", "name": "@jackwener/wx-cli-win32-x64",
"version": "0.1.11", "version": "0.3.0",
"description": "wx-cli binary for Windows x64", "description": "wx-cli binary for Windows x64",
"os": ["win32"], "os": ["win32"],
"cpu": ["x64"], "cpu": ["x64"],

View File

@ -1,6 +1,6 @@
{ {
"name": "@jackwener/wx-cli", "name": "@jackwener/wx-cli",
"version": "0.1.11", "version": "0.3.0",
"description": "Query your local WeChat data from the command line. Designed for LLM agent tool calls.", "description": "Query your local WeChat data from the command line. Designed for LLM agent tool calls.",
"bin": { "bin": {
"wx": "bin/wx.js" "wx": "bin/wx.js"
@ -13,11 +13,11 @@
"install.js" "install.js"
], ],
"optionalDependencies": { "optionalDependencies": {
"@jackwener/wx-cli-darwin-arm64": "0.1.11", "@jackwener/wx-cli-darwin-arm64": "0.3.0",
"@jackwener/wx-cli-darwin-x64": "0.1.11", "@jackwener/wx-cli-darwin-x64": "0.3.0",
"@jackwener/wx-cli-linux-x64": "0.1.11", "@jackwener/wx-cli-linux-x64": "0.3.0",
"@jackwener/wx-cli-linux-arm64": "0.1.11", "@jackwener/wx-cli-linux-arm64": "0.3.0",
"@jackwener/wx-cli-win32-x64": "0.1.11" "@jackwener/wx-cli-win32-x64": "0.3.0"
}, },
"engines": { "node": ">=14" }, "engines": { "node": ">=14" },
"keywords": ["wechat", "cli", "wx", "llm", "ai", "sqlite", "sqlcipher"], "keywords": ["wechat", "cli", "wx", "llm", "ai", "sqlite", "sqlcipher"],

View File

@ -0,0 +1,153 @@
//! 不透明附件 ID — 跨 CLI / IPC 的圆 trip 句柄。
//!
//! 编码:`base64url_no_pad(serde_json(payload))`。
//! 选择 base64url(json) 而不是紧凑 bit-pack
//! - phase 1 求稳,不发明二进制协议
//! - 后面加字段(`resource_md5` / `decoder_hint` 之类)老 CLI 不 break
//! - debug 直接 base64 -d | jq 看字段
//!
//! ⚠️ `local_id` 在同一 chat 内会被 WeChat 复用(实测同 chat 最多 7 条同 local_id
//! 所以 `(chat, local_id, create_time)` 三元组才是定位资源行的最小集。
use anyhow::{anyhow, Context, Result};
use base64::{engine::general_purpose::URL_SAFE_NO_PAD, Engine};
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum AttachmentKind {
Image,
Video,
File,
Voice,
}
impl AttachmentKind {
/// 从 message.local_type 推 attachment kind只覆盖 phase 1 关心的几种)。
/// 高 32 bit 是版本/会话 flag要先 mask 到低 32 bit。
pub fn from_local_type(local_type: i64) -> Option<Self> {
let lo = (local_type as u64) & 0xFFFF_FFFF;
match lo {
3 => Some(AttachmentKind::Image),
34 => Some(AttachmentKind::Voice),
43 => Some(AttachmentKind::Video),
// type=49 是 appmsg里面 subtype=6 才是文件;这里偏宽松返回 File
// 由 resolver 进一步根据 appmsg subtype 决定是否真的能 extract
49 => Some(AttachmentKind::File),
_ => None,
}
}
pub fn as_str(&self) -> &'static str {
match self {
AttachmentKind::Image => "image",
AttachmentKind::Video => "video",
AttachmentKind::File => "file",
AttachmentKind::Voice => "voice",
}
}
}
/// 附件 ID payload序列化后 base64url 编码)。
///
/// `v` 是版本字段,将来 schema 变了可以走分支兼容。当前 v=1。
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AttachmentId {
/// payload schema version
pub v: u32,
/// 会话 username同时用于 ChatName2Id 查 chat_id 和拼 attach 路径)
pub chat: String,
/// 消息行的 local_id
pub local_id: i64,
/// 消息行的 create_timeunix 秒)— 用于 disambiguate 同 chat 内 local_id 复用
pub create_time: i64,
/// 附件类别
pub kind: AttachmentKind,
/// 可选 hint消息所在 message_N.db 的 N。给定时 resolver 可跳过 shard 扫描;
/// 缺省时 resolver 会按 `find_msg_tables` 逻辑全量扫
#[serde(default, skip_serializing_if = "Option::is_none")]
pub db: Option<u8>,
}
impl AttachmentId {
pub fn encode(&self) -> Result<String> {
let json = serde_json::to_vec(self).context("序列化 AttachmentId")?;
Ok(URL_SAFE_NO_PAD.encode(json))
}
pub fn decode(s: &str) -> Result<Self> {
let bytes = URL_SAFE_NO_PAD
.decode(s.trim())
.map_err(|e| anyhow!("attachment_id 不是合法 base64url: {}", e))?;
let id: AttachmentId =
serde_json::from_slice(&bytes).context("attachment_id payload 非合法 JSON")?;
if id.v != 1 {
return Err(anyhow!("不支持的 attachment_id 版本 v={}", id.v));
}
Ok(id)
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn round_trip_minimal() {
let id = AttachmentId {
v: 1,
chat: "wxid_abc".to_string(),
local_id: 12345,
create_time: 1_715_678_901,
kind: AttachmentKind::Image,
db: None,
};
let s = id.encode().unwrap();
let back = AttachmentId::decode(&s).unwrap();
assert_eq!(back.chat, id.chat);
assert_eq!(back.local_id, id.local_id);
assert_eq!(back.create_time, id.create_time);
assert_eq!(back.kind, id.kind);
assert_eq!(back.db, id.db);
}
#[test]
fn round_trip_with_db_hint() {
let id = AttachmentId {
v: 1,
chat: "1234@chatroom".to_string(),
local_id: 42,
create_time: 1,
kind: AttachmentKind::Image,
db: Some(2),
};
let s = id.encode().unwrap();
assert!(!s.contains('=')); // base64url no-pad
let back = AttachmentId::decode(&s).unwrap();
assert_eq!(back.db, Some(2));
}
#[test]
fn local_type_mask_high_bits() {
// monitor_web.py 里 image push 路径:高位带 flag低 32 bit 是 3
let high_flag = (0xDEAD_BEEFu64 << 32) as i64 | 3;
assert_eq!(
AttachmentKind::from_local_type(high_flag),
Some(AttachmentKind::Image)
);
}
#[test]
fn rejects_unknown_version() {
let id = AttachmentId {
v: 99,
chat: "x".to_string(),
local_id: 0,
create_time: 0,
kind: AttachmentKind::Image,
db: None,
};
let s = id.encode().unwrap();
assert!(AttachmentId::decode(&s).is_err());
}
}

View File

@ -0,0 +1,122 @@
//! `.dat` 文件解码:根据 6B header magic 分发到具体 decoder。
//!
//! 三档:
//! | header[0..6] | decoder | 备注 |
//! |-------------------------|-------------------|-----------------------------------------|
//! | `07 08 V2 08 07` | `v2` | AES-128-ECB + XOR 混合,需要 image AES key |
//! | `07 08 V1 08 07` | `v1_aes` | 固定 AES key `cfcd208495d565ef` |
//! | (其他, 通常无 magic) | `v1_xor` | legacy single-byte XORmagic 自动探测 |
//!
//! 决策点放在 `dispatch`,让上层(`resolver` / CLI extract 命令)只跟一个入口打交道。
use anyhow::{anyhow, Result};
pub mod v1_xor;
pub mod v2;
/// 完整 V2 magic`\x07\x08V2\x08\x07`
pub const V2_MAGIC: [u8; 6] = [0x07, 0x08, b'V', b'2', 0x08, 0x07];
/// 完整 V1 magic`\x07\x08V1\x08\x07`
pub const V1_MAGIC: [u8; 6] = [0x07, 0x08, b'V', b'1', 0x08, 0x07];
/// 解码后的产物 + 探测出的图片格式
#[derive(Debug)]
pub struct DecodedImage {
pub data: Vec<u8>,
/// 推断出的图片扩展名(不带点),由 magic 决定。例如 "jpg" / "png" / "gif" / "webp" /
/// "tif" / "bmp" / "hevc"wxgf 容器)/ "bin"(未识别)
pub format: &'static str,
/// 解码器名称("legacy_xor" / "v1_aes" / "v2"),用于 CLI 调试输出
pub decoder: &'static str,
}
/// 由 caller 提供的 V2 image AES keycodex 的 `image_key` 模块负责拿到)。
/// 缺省时遇到 V2 文件会返回 `Err`caller 可以拿到具体错误信息再处理。
#[derive(Debug, Clone, Copy, Default)]
pub struct V2KeyMaterial<'a> {
pub aes_key: Option<&'a [u8; 16]>,
/// XOR key — WeChat 4.x 默认 0x88可 override
pub xor_key: u8,
}
impl<'a> V2KeyMaterial<'a> {
pub fn with_aes(key: &'a [u8; 16]) -> Self {
Self { aes_key: Some(key), xor_key: 0x88 }
}
}
/// 根据 `dat_bytes` 头部 magic 自动分发到对应 decoder。
///
/// `v2_key` 仅在文件是 V2 magic 时被消费。
pub fn dispatch(dat_bytes: &[u8], v2_key: V2KeyMaterial<'_>) -> Result<DecodedImage> {
if dat_bytes.len() >= 6 {
let head: &[u8; 6] = dat_bytes[..6].try_into().unwrap();
if head == &V2_MAGIC {
return v2::decode(dat_bytes, v2_key);
}
if head == &V1_MAGIC {
// V1 fixed-AES: 固定 key = md5("0")[:16] = "cfcd208495d565ef"
let fixed_key: [u8; 16] = *b"cfcd208495d565ef";
return v2::decode(
dat_bytes,
V2KeyMaterial { aes_key: Some(&fixed_key), xor_key: v2_key.xor_key },
)
.map(|mut d| {
d.decoder = "v1_aes";
d
});
}
}
if dat_bytes.is_empty() {
return Err(anyhow!("空 .dat 文件"));
}
v1_xor::decode(dat_bytes)
}
/// 从解密后的字节流头部探测图片格式扩展名。
///
/// 与上游 `decode_image.py::detect_image_format` 一致;新增 wxgf (HEVC 裸流) 的探测,
/// 因为 V2 解码后产物可能直接是 wxgf 容器。
pub fn detect_image_format(bytes: &[u8]) -> &'static str {
if bytes.len() >= 4 && &bytes[..4] == b"wxgf" {
return "hevc";
}
if bytes.len() >= 3 && bytes[..3] == [0xFF, 0xD8, 0xFF] {
return "jpg";
}
if bytes.len() >= 4 && bytes[..4] == [0x89, 0x50, 0x4E, 0x47] {
return "png";
}
if bytes.len() >= 3 && &bytes[..3] == b"GIF" {
return "gif";
}
if bytes.len() >= 12 && &bytes[..4] == b"RIFF" && &bytes[8..12] == b"WEBP" {
return "webp";
}
if bytes.len() >= 4 && bytes[..4] == [0x49, 0x49, 0x2A, 0x00] {
return "tif";
}
if bytes.len() >= 2 && &bytes[..2] == b"BM" {
return "bmp";
}
"bin"
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn detect_basic_formats() {
assert_eq!(detect_image_format(&[0xFF, 0xD8, 0xFF, 0xE0]), "jpg");
assert_eq!(detect_image_format(&[0x89, 0x50, 0x4E, 0x47]), "png");
assert_eq!(detect_image_format(b"GIF89a"), "gif");
assert_eq!(detect_image_format(b"BM\0\0\0\0\0\0\0\0\0\0\0\0"), "bmp");
let mut webp = b"RIFF\0\0\0\0WEBP".to_vec();
webp.extend_from_slice(&[0; 4]);
assert_eq!(detect_image_format(&webp), "webp");
assert_eq!(detect_image_format(&[0x49, 0x49, 0x2A, 0x00]), "tif");
assert_eq!(detect_image_format(b"wxgfXXXX"), "hevc");
assert_eq!(detect_image_format(&[0, 0, 0, 0]), "bin");
}
}

View File

@ -0,0 +1,166 @@
//! Legacy single-byte XOR decoder无 magic 头的旧 .dat
//!
//! 算法:用已知图片 magic 反推 XOR key —— `key = file[0] ^ magic[0]`。
//! 然后用同一个 key 校验 `file[i] ^ key == magic[i]`,全部命中才接受这个 key。
//!
//! 优先级(按 magic 长度降序,避免短 magic 假阳性):
//! PNG (4) > GIF (4) > TIF (4) > WEBP (4, RIFF) > JPG (3) > BMP (2, 需额外校验)
//!
//! BMP 只有 2 字节 magic假阳性高额外用 BMP file header 里的
//! `bf_size`offset 2, u32 LE和 `bf_offset`offset 10, u32 LE做合理性校验
//! - `|bf_size - file_size| < 1024`(允许微小 padding 差)
//! - `14 <= bf_offset <= 1078`(最大调色板 256*4 + header 14 = 1038留点余量
use anyhow::{anyhow, Result};
use super::{detect_image_format, DecodedImage};
const PNG: &[u8] = &[0x89, 0x50, 0x4E, 0x47];
const GIF: &[u8] = &[0x47, 0x49, 0x46, 0x38];
const TIF: &[u8] = &[0x49, 0x49, 0x2A, 0x00];
const WEBP_RIFF: &[u8] = &[0x52, 0x49, 0x46, 0x46];
const JPG: &[u8] = &[0xFF, 0xD8, 0xFF];
const BMP: &[u8] = &[0x42, 0x4D];
/// 在 `header` 上尝试一个固定 magic返回 `Some(key)` 当且仅当所有字节都对得上。
fn try_magic(header: &[u8], magic: &[u8]) -> Option<u8> {
if header.len() < magic.len() {
return None;
}
let key = header[0] ^ magic[0];
for i in 1..magic.len() {
if header[i] ^ key != magic[i] {
return None;
}
}
Some(key)
}
/// 探测 XOR key。失败返回 `None`caller 决定是不是错)。
pub fn detect_key(file_bytes: &[u8]) -> Option<u8> {
if file_bytes.len() < 4 {
return None;
}
let header = &file_bytes[..file_bytes.len().min(16)];
// 先试 3+ 字节 magic
for magic in [PNG, GIF, TIF, WEBP_RIFF, JPG] {
if let Some(k) = try_magic(header, magic) {
return Some(k);
}
}
// 最后试 BMP只有 2B magic需额外校验
if let Some(k) = try_magic(header, BMP) {
if header.len() >= 14 {
// 解 BMP file header 14 字节
let mut dec = [0u8; 14];
for i in 0..14 {
dec[i] = header[i] ^ k;
}
let bmp_size = u32::from_le_bytes([dec[2], dec[3], dec[4], dec[5]]);
let bmp_offset = u32::from_le_bytes([dec[10], dec[11], dec[12], dec[13]]);
let file_size = file_bytes.len() as u32;
// 允许 1024 字节 padding 差offset 在合理范围
if file_size.abs_diff(bmp_size) < 1024 && (14..=1078).contains(&bmp_offset) {
return Some(k);
}
}
}
None
}
/// XOR 解码整个 `.dat` 内容。
pub fn decode(file_bytes: &[u8]) -> Result<DecodedImage> {
let key =
detect_key(file_bytes).ok_or_else(|| anyhow!("legacy XOR: 无法识别图片 magickey 探测失败)"))?;
let data: Vec<u8> = file_bytes.iter().map(|b| b ^ key).collect();
let format = detect_image_format(&data);
if format == "bin" {
return Err(anyhow!("legacy XOR: 解出 key=0x{:02x} 但产物 magic 不识别", key));
}
Ok(DecodedImage { data, format, decoder: "legacy_xor" })
}
#[cfg(test)]
mod tests {
use super::*;
/// 把一段 plaintext 用单字节 key XOR 加密,模拟 .dat 文件
fn xor_encrypt(plain: &[u8], key: u8) -> Vec<u8> {
plain.iter().map(|b| b ^ key).collect()
}
#[test]
fn detect_jpg_key() {
let plain = vec![0xFF, 0xD8, 0xFF, 0xE0, 0x00, 0x10, 0x4A, 0x46, 0x49, 0x46];
let enc = xor_encrypt(&plain, 0x3C);
assert_eq!(detect_key(&enc), Some(0x3C));
}
#[test]
fn detect_png_key() {
let mut plain = vec![0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A];
plain.extend_from_slice(&[0; 16]);
let enc = xor_encrypt(&plain, 0xA5);
assert_eq!(detect_key(&enc), Some(0xA5));
}
#[test]
fn detect_gif_key() {
let mut plain = b"GIF89a".to_vec();
plain.extend_from_slice(&[0; 16]);
let enc = xor_encrypt(&plain, 0x77);
assert_eq!(detect_key(&enc), Some(0x77));
}
#[test]
fn detect_webp_riff_key() {
let mut plain = b"RIFF\x00\x00\x00\x00WEBP".to_vec();
plain.extend_from_slice(&[0; 8]);
let enc = xor_encrypt(&plain, 0x12);
assert_eq!(detect_key(&enc), Some(0x12));
}
#[test]
fn detect_tif_key() {
let mut plain = vec![0x49, 0x49, 0x2A, 0x00, 0x08, 0x00, 0x00, 0x00];
plain.extend_from_slice(&[0; 16]);
let enc = xor_encrypt(&plain, 0xC3);
assert_eq!(detect_key(&enc), Some(0xC3));
}
#[test]
fn detect_bmp_with_valid_header() {
// BMP 14B header: 'BM' + size(u32 LE) + reserved(2*u16) + offset(u32 LE)
let mut plain = Vec::new();
plain.extend_from_slice(b"BM");
plain.extend_from_slice(&100u32.to_le_bytes()); // file_size = 100
plain.extend_from_slice(&[0; 4]); // reserved
plain.extend_from_slice(&54u32.to_le_bytes()); // pixel data offset = 54
plain.resize(100, 0); // 整个文件 100 字节,匹配 file_size
let enc = xor_encrypt(&plain, 0x55);
assert_eq!(detect_key(&enc), Some(0x55));
}
#[test]
fn reject_random_bytes() {
// 全 0 文件BMP 检测会算出 key = 0x42 ^ 0 = 0x42
// 但解密出的 BMP file_size = 0 vs file_size = 100差距 > 1024 →
// 应该 reject
let bytes = vec![0u8; 100];
assert_eq!(detect_key(&bytes), None);
}
#[test]
fn decode_round_trip_jpg() {
let mut plain = vec![0xFF, 0xD8, 0xFF, 0xE0];
plain.extend_from_slice(b"JFIF padding here");
let enc = xor_encrypt(&plain, 0xAB);
let out = decode(&enc).unwrap();
assert_eq!(out.format, "jpg");
assert_eq!(out.decoder, "legacy_xor");
assert_eq!(out.data, plain);
}
}

View File

@ -0,0 +1,130 @@
//! V2 .dat 解码:`AES-128-ECB(PKCS7) + raw + XOR` 三段拼接。
//!
//! 文件结构(来自上游 `decode_image.py::v2_decrypt_file`
//! `[6B magic V2/V1] [4B aes_size LE] [4B xor_size LE] [1B padding]`
//! `[aligned_aes_size bytes AES-ECB ciphertext]`
//! `[len - aligned_aes_size - xor_size bytes raw_data (不加密)]`
//! `[xor_size bytes XOR (单字节 key)]`
//!
//! `aligned_aes_size`:把 `aes_size` 向上对齐到 16 的倍数;当 `aes_size` 本身是
//! 16 的倍数时PKCS7 还会再加一整块 padding所以再 +16。等价于
//! `aes_size + (16 - aes_size % 16)`。
//!
//! ⚠️ 此模块由 codex 落地完整 V2 实现 + image key 模块。当前只提供一个
//! `decode` 入口骨架,方便 v1_aes 路径(固定 key和 dispatch 一起编译过。
//! `aes_key=None` 时返回带具体诊断信息的错误。
use anyhow::{anyhow, bail, Result};
use super::{detect_image_format, DecodedImage, V2KeyMaterial, V1_MAGIC, V2_MAGIC};
const HEADER_SIZE: usize = 15;
pub fn decode(file_bytes: &[u8], key: V2KeyMaterial<'_>) -> Result<DecodedImage> {
if file_bytes.len() < HEADER_SIZE {
bail!("V2 .dat: 文件过短({} < {} 字节)", file_bytes.len(), HEADER_SIZE);
}
let magic: &[u8; 6] = file_bytes[..6].try_into().unwrap();
if magic != &V2_MAGIC && magic != &V1_MAGIC {
bail!("V2 .dat: header magic 不匹配 V1/V2");
}
let aes_key = key.aes_key.ok_or_else(|| {
anyhow!("V2 .dat: 需要 image AES keycodex 的 image_key 模块尚未填充)")
})?;
let aes_size = u32::from_le_bytes(file_bytes[6..10].try_into().unwrap()) as usize;
let xor_size = u32::from_le_bytes(file_bytes[10..14].try_into().unwrap()) as usize;
// PKCS7 对齐aes_size 不是 16 的倍数 → 向上对齐;是 16 的倍数 → 再加一整块
let aligned_aes_size = aes_size + (16 - (aes_size % 16));
let aes_end = HEADER_SIZE.checked_add(aligned_aes_size).ok_or_else(|| anyhow!("aes 段长度溢出"))?;
if aes_end > file_bytes.len() {
bail!(
"V2 .dat: 头部宣称 aes_size={} (aligned={}) 超过文件长度 {}",
aes_size,
aligned_aes_size,
file_bytes.len()
);
}
let raw_end = file_bytes.len().checked_sub(xor_size).ok_or_else(|| {
anyhow!("V2 .dat: 头部宣称 xor_size={} 超过文件长度 {}", xor_size, file_bytes.len())
})?;
if aes_end > raw_end {
bail!(
"V2 .dat: aes_end={} > raw_end={}aes/xor 段重叠)",
aes_end,
raw_end
);
}
// === AES-128-ECB 解密 + PKCS7 unpad ===
let aes_data = &file_bytes[HEADER_SIZE..aes_end];
let dec_aes = aes_ecb_decrypt_pkcs7(aes_key, aes_data)?;
// === Raw 段(未加密) ===
let raw_data = &file_bytes[aes_end..raw_end];
// === XOR 段 ===
let xor_data: Vec<u8> = file_bytes[raw_end..].iter().map(|b| b ^ key.xor_key).collect();
let mut out = Vec::with_capacity(dec_aes.len() + raw_data.len() + xor_data.len());
out.extend_from_slice(&dec_aes);
out.extend_from_slice(raw_data);
out.extend_from_slice(&xor_data);
let format = detect_image_format(&out);
if format == "bin" {
bail!("V2 .dat: AES 解密成功但产物 magic 不识别key 可能错)");
}
Ok(DecodedImage { data: out, format, decoder: "v2" })
}
/// AES-128-ECB 解密 + PKCS7 unpad。失败时返回 `Err`,不返回半结果。
///
/// 不引第三方 ECB 包ECB 本身就是 block-by-block手工跑就行。
/// PKCS7 padding 由本函数最后一段做 strict 校验:长度 1..=16且尾部全是同值字节。
fn aes_ecb_decrypt_pkcs7(key: &[u8; 16], cipher: &[u8]) -> Result<Vec<u8>> {
use aes::cipher::{generic_array::GenericArray, BlockDecrypt, KeyInit};
if cipher.is_empty() || cipher.len() % 16 != 0 {
bail!("AES 输入长度 {} 不是 16 的倍数", cipher.len());
}
let aes = aes::Aes128::new(key.into());
let mut out = Vec::with_capacity(cipher.len());
for chunk in cipher.chunks_exact(16) {
let mut block = GenericArray::clone_from_slice(chunk);
aes.decrypt_block(&mut block);
out.extend_from_slice(&block);
}
let pad = *out.last().ok_or_else(|| anyhow!("AES PKCS7: 空输出"))? as usize;
if pad == 0 || pad > 16 || pad > out.len() {
bail!("AES PKCS7: 非法 padding 长度 {}", pad);
}
let tail = &out[out.len() - pad..];
if !tail.iter().all(|&b| b as usize == pad) {
bail!("AES PKCS7: padding 字节不一致");
}
out.truncate(out.len() - pad);
Ok(out)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn rejects_short_file() {
let r = decode(&[0u8; 4], V2KeyMaterial::default());
assert!(r.is_err());
}
#[test]
fn rejects_v2_without_key() {
let mut buf = V2_MAGIC.to_vec();
buf.extend_from_slice(&[0u8; HEADER_SIZE - 6]);
let r = decode(&buf, V2KeyMaterial::default());
let err = r.unwrap_err().to_string();
assert!(err.contains("AES key"), "{}", err);
}
}

View File

@ -0,0 +1,11 @@
use anyhow::{bail, Result};
use super::{ImageKeyMaterial, ImageKeyProvider};
pub struct LinuxImageKeyProvider;
impl ImageKeyProvider for LinuxImageKeyProvider {
fn get_key(&self, _wxid: &str) -> Result<ImageKeyMaterial> {
bail!("Linux V2 图片 key 当前未实现;请先用 legacy/V1 图片或在 README 中标注 unsupported")
}
}

View File

@ -0,0 +1,423 @@
//! macOS V2 image AES key 提取。
//!
//! 主路径:从 `key_<uin>_*.statistic` 文件名拿 uin然后
//! `md5(str(uin) + normalize(wxid)).hex()[:16]` 派生 AES key。
//!
//! fallback通过 `md5(str(uin))[:4] == wxid_suffix` + `uin & 0xff == xor_key`
//! 把搜索空间压到 2^24再用 V2 模板反验 AES key。
use anyhow::{bail, Context, Result};
use std::collections::HashMap;
use std::path::{Path, PathBuf};
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::{mpsc, Arc, Mutex};
use crate::config;
use super::{
attach_root_for_db_dir, configured_db_dir_for_wxid, derive_xor_key_from_v2_dat,
find_v2_template_ciphertexts, join_components, normalize_wxid, verify_aes_key, wxid_from_db_dir,
ImageKeyMaterial, ImageKeyProvider,
};
pub struct MacosImageKeyProvider {
configured_db_dir: Result<PathBuf, String>,
cache: Mutex<HashMap<String, ImageKeyMaterial>>,
}
impl MacosImageKeyProvider {
pub fn from_current_config() -> Self {
let configured_db_dir = config::load_config()
.map(|cfg| cfg.db_dir)
.map_err(|err| err.to_string());
Self {
configured_db_dir,
cache: Mutex::new(HashMap::new()),
}
}
}
impl ImageKeyProvider for MacosImageKeyProvider {
fn get_key(&self, wxid: &str) -> Result<ImageKeyMaterial> {
let cache_key = normalize_wxid(wxid);
if let Some(found) = self.cache.lock().unwrap().get(&cache_key).copied() {
return Ok(found);
}
let configured_db_dir = self
.configured_db_dir
.as_ref()
.map_err(|err| anyhow::anyhow!("读取 config.db_dir 失败: {}", err))?;
let db_dir = configured_db_dir_for_wxid(configured_db_dir, wxid);
let attach_dir = attach_root_for_db_dir(&db_dir);
let key = derive_key_for_paths(&db_dir, &attach_dir)?;
self.cache.lock().unwrap().insert(cache_key, key);
Ok(key)
}
}
fn derive_key_for_paths(db_dir: &Path, attach_dir: &Path) -> Result<ImageKeyMaterial> {
let templates = find_v2_template_ciphertexts(attach_dir, 3, 64)?;
if templates.is_empty() {
bail!("在 {} 下找不到 V2 模板文件", attach_dir.display());
}
if let Some(found) = find_via_kvcomm(db_dir, &templates)? {
return Ok(found);
}
let (wxid_full, wxid_norm, suffix) =
extract_wxid_parts(db_dir).context("db_dir 不含可用于 fallback 的 wxid 4 位后缀")?;
let (xor_key, _votes, _total) = derive_xor_key_from_v2_dat(attach_dir, 10, 3)?
.context("V2 .dat 样本不足,无法投票反推 xor_key")?;
for wxid in preferred_wxid_candidates(&wxid_full, &wxid_norm) {
if let Some(aes_key) = bruteforce_aes_key(xor_key, &suffix, wxid, &templates)? {
return Ok(ImageKeyMaterial { aes_key, xor_key });
}
}
bail!("macOS V2 图片 key 派生失败")
}
fn find_via_kvcomm(db_dir: &Path, templates: &[[u8; 16]]) -> Result<Option<ImageKeyMaterial>> {
let Some(kvcomm_dir) = find_existing_kvcomm_dir(db_dir) else {
return Ok(None);
};
let codes = collect_kvcomm_codes(&kvcomm_dir)?;
if codes.is_empty() {
return Ok(None);
}
let wxids = collect_wxid_candidates(db_dir);
if wxids.is_empty() {
return Ok(None);
}
for wxid in wxids {
for code in &codes {
let candidate = derive_image_key_material(*code, &wxid);
if verify_aes_key(&candidate.aes_key, templates) {
return Ok(Some(candidate));
}
}
}
Ok(None)
}
fn derive_image_key_material(code: u32, wxid: &str) -> ImageKeyMaterial {
let xor_key = (code & 0xFF) as u8;
let digest = format!("{:x}", md5::compute(format!("{}{}", code, wxid)));
let mut aes_key = [0u8; 16];
aes_key.copy_from_slice(&digest.as_bytes()[..16]);
ImageKeyMaterial { aes_key, xor_key }
}
fn collect_wxid_candidates(db_dir: &Path) -> Vec<String> {
let Some(raw) = wxid_from_db_dir(db_dir) else {
return Vec::new();
};
let mut out = vec![raw.clone()];
let normalized = normalize_wxid(&raw);
if normalized != raw {
out.push(normalized);
}
out
}
fn extract_wxid_parts(db_dir: &Path) -> Option<(String, String, String)> {
let raw = wxid_from_db_dir(db_dir)?;
let idx = raw.rfind('_')?;
let suffix = &raw[idx + 1..];
if suffix.len() != 4 || !suffix.bytes().all(|byte| byte.is_ascii_hexdigit()) {
return None;
}
Some((raw.clone(), normalize_wxid(&raw), suffix.to_ascii_lowercase()))
}
fn preferred_wxid_candidates<'a>(raw: &'a str, normalized: &'a str) -> Vec<&'a str> {
if raw == normalized {
vec![raw]
} else {
vec![normalized, raw]
}
}
fn derive_kvcomm_dir_candidates(db_dir: &Path) -> Vec<PathBuf> {
let parts: Vec<String> = db_dir
.components()
.map(|component| component.as_os_str().to_string_lossy().into_owned())
.collect();
let mut candidates = Vec::new();
if let Some(idx) = parts.iter().position(|part| part == "xwechat_files") {
let documents_root = join_components(&parts[..idx]);
candidates.push(documents_root.join("app_data/net/kvcomm"));
candidates.push(documents_root.join("xwechat/net/kvcomm"));
if idx >= 1 {
let container_root = join_components(&parts[..idx - 1]);
candidates.push(
container_root
.join("Library/Application Support/com.tencent.xinWeChat/xwechat/net/kvcomm"),
);
candidates.push(
container_root.join("Library/Application Support/com.tencent.xinWeChat/net/kvcomm"),
);
}
}
if let Some(home) = dirs::home_dir() {
candidates.push(
home.join("Library/Containers/com.tencent.xinWeChat/Data/Documents/app_data/net/kvcomm"),
);
}
let mut dedup = Vec::new();
for candidate in candidates {
if !dedup.contains(&candidate) {
dedup.push(candidate);
}
}
dedup
}
fn find_existing_kvcomm_dir(db_dir: &Path) -> Option<PathBuf> {
derive_kvcomm_dir_candidates(db_dir)
.into_iter()
.find(|path| path.is_dir())
}
fn collect_kvcomm_codes(kvcomm_dir: &Path) -> Result<Vec<u32>> {
let mut codes = std::collections::BTreeSet::new();
for entry in std::fs::read_dir(kvcomm_dir)? {
let entry = entry?;
let Some(name) = entry.file_name().to_str().map(|value| value.to_string()) else {
continue;
};
let Some(rest) = name.strip_prefix("key_") else {
continue;
};
let Some((code, _)) = rest.split_once('_') else {
continue;
};
if let Ok(code) = code.parse::<u32>() {
codes.insert(code);
}
}
Ok(codes.into_iter().collect())
}
fn bruteforce_aes_key(
xor_key: u8,
suffix_hex: &str,
wxid: &str,
templates: &[[u8; 16]],
) -> Result<Option<[u8; 16]>> {
let suffix = hex_prefix_to_bytes(suffix_hex)?;
let workers = std::thread::available_parallelism()
.map(|count| count.get())
.unwrap_or(1)
.max(1);
let total = 1u32 << 24;
let chunk = total / workers as u32;
let stop = Arc::new(AtomicBool::new(false));
let (tx, rx) = mpsc::channel();
let wxid = Arc::new(wxid.as_bytes().to_vec());
let templates = Arc::new(templates.to_vec());
std::thread::scope(|scope| {
for idx in 0..workers {
let start = idx as u32 * chunk;
let end = if idx + 1 == workers {
total
} else {
(idx as u32 + 1) * chunk
};
let stop = Arc::clone(&stop);
let tx = tx.clone();
let wxid = Arc::clone(&wxid);
let templates = Arc::clone(&templates);
scope.spawn(move || {
for upper in start..end {
if stop.load(Ordering::Relaxed) {
break;
}
let uin = (upper << 8) | xor_key as u32;
let uin_ascii = uin.to_string();
let digest = md5::compute(uin_ascii.as_bytes());
if digest.0[0] != suffix[0] || digest.0[1] != suffix[1] {
continue;
}
let mut input = Vec::with_capacity(uin_ascii.len() + wxid.len());
input.extend_from_slice(uin_ascii.as_bytes());
input.extend_from_slice(&wxid);
let aes_hex = format!("{:x}", md5::compute(input));
let mut aes_key = [0u8; 16];
aes_key.copy_from_slice(&aes_hex.as_bytes()[..16]);
if verify_aes_key(&aes_key, &templates) {
stop.store(true, Ordering::Relaxed);
let _ = tx.send(aes_key);
break;
}
}
});
}
});
drop(tx);
Ok(rx.try_iter().next())
}
fn hex_prefix_to_bytes(hex: &str) -> Result<[u8; 2]> {
if hex.len() != 4 {
bail!("wxid suffix 不是 4 位 hex: {}", hex);
}
let hi = u8::from_str_radix(&hex[..2], 16)?;
let lo = u8::from_str_radix(&hex[2..], 16)?;
Ok([hi, lo])
}
#[cfg(test)]
mod tests {
use super::{derive_key_for_paths, find_existing_kvcomm_dir};
use super::collect_wxid_candidates;
use crate::attachment::image_key::normalize_wxid;
use aes::cipher::{generic_array::GenericArray, BlockEncrypt, KeyInit};
use aes::Aes128;
use std::fs;
use std::path::Path;
fn temp_dir(label: &str) -> std::path::PathBuf {
let mut dir = std::env::temp_dir();
dir.push(format!(
"wx-cli-image-key-macos-{}-{:?}",
label,
std::thread::current().id()
));
let _ = fs::remove_dir_all(&dir);
fs::create_dir_all(&dir).unwrap();
dir
}
fn write_v2_template(path: &Path, aes_key: &[u8; 16], xor_key: u8, plaintext: &[u8; 16]) {
let cipher = Aes128::new(aes_key.into());
let mut block = GenericArray::clone_from_slice(plaintext);
cipher.encrypt_block(&mut block);
let mut data = Vec::new();
data.extend_from_slice(&crate::attachment::decoder::V2_MAGIC);
data.extend_from_slice(&0u32.to_le_bytes());
data.extend_from_slice(&0u32.to_le_bytes());
data.push(0);
data.extend_from_slice(&block);
data.push(0);
data.push(0xD9 ^ xor_key);
fs::create_dir_all(path.parent().unwrap()).unwrap();
fs::write(path, data).unwrap();
}
#[test]
fn normalize_wxid_matches_expected_shapes() {
assert_eq!(normalize_wxid("wxid_abc_def"), "wxid_abc");
assert_eq!(normalize_wxid("your_wxid_a1b2"), "your_wxid");
assert_eq!(normalize_wxid("plain"), "plain");
}
#[test]
fn kvcomm_path_detection_works() {
let dir = temp_dir("kvcomm");
let db_dir = dir.join(
"Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/your_wxid_a1b2/db_storage",
);
let kvcomm = dir.join(
"Library/Containers/com.tencent.xinWeChat/Data/Documents/app_data/net/kvcomm",
);
fs::create_dir_all(&db_dir).unwrap();
fs::create_dir_all(&kvcomm).unwrap();
assert_eq!(find_existing_kvcomm_dir(&db_dir), Some(kvcomm));
let _ = fs::remove_dir_all(dir);
}
#[test]
fn derives_key_via_kvcomm() {
let dir = temp_dir("via-kvcomm");
let db_dir = dir.join(
"Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/your_wxid_a1b2/db_storage",
);
let attach = dir.join(
"Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/your_wxid_a1b2/msg/attach/chat/2026-05/Img",
);
let kvcomm = dir.join(
"Library/Containers/com.tencent.xinWeChat/Data/Documents/app_data/net/kvcomm",
);
fs::create_dir_all(&db_dir).unwrap();
fs::create_dir_all(&kvcomm).unwrap();
fs::write(kvcomm.join("key_42_x.statistic"), b"").unwrap();
let digest = format!("{:x}", md5::compute("42your_wxid"));
let mut aes_key = [0u8; 16];
aes_key.copy_from_slice(&digest.as_bytes()[..16]);
write_v2_template(
&attach.join("sample_t.dat"),
&aes_key,
42,
b"\xFF\xD8\xFFtemplate-001!",
);
let derived = derive_key_for_paths(&db_dir, db_dir.parent().unwrap().join("msg/attach").as_path())
.unwrap();
assert_eq!(derived.aes_key, aes_key);
assert_eq!(derived.xor_key, 42);
let _ = fs::remove_dir_all(dir);
}
#[test]
fn derives_key_via_bruteforce_fallback() {
let dir = temp_dir("via-fallback");
let suffix = format!("{:x}", md5::compute("42"))
.chars()
.take(4)
.collect::<String>();
let raw_wxid = format!("mywxid_{}", suffix);
let db_dir = dir.join(format!(
"Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/{}/db_storage",
raw_wxid
));
let attach = dir.join(format!(
"Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/{}/msg/attach/chat/2026-05/Img",
raw_wxid
));
fs::create_dir_all(&db_dir).unwrap();
let digest = format!("{:x}", md5::compute("42mywxid"));
let mut aes_key = [0u8; 16];
aes_key.copy_from_slice(&digest.as_bytes()[..16]);
for idx in 0..3 {
write_v2_template(
&attach.join(format!("sample{}_t.dat", idx)),
&aes_key,
42,
b"\xFF\xD8\xFFtemplate-001!",
);
}
let derived = derive_key_for_paths(&db_dir, db_dir.parent().unwrap().join("msg/attach").as_path())
.unwrap();
assert_eq!(derived.aes_key, aes_key);
assert_eq!(derived.xor_key, 42);
let _ = fs::remove_dir_all(dir);
}
#[test]
fn collects_raw_and_normalized_wxid() {
let dir = temp_dir("wxid");
let db_dir = dir.join(
"Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/your_wxid_a1b2/db_storage",
);
fs::create_dir_all(&db_dir).unwrap();
let wxids = collect_wxid_candidates(&db_dir);
assert_eq!(wxids, vec!["your_wxid_a1b2".to_string(), "your_wxid".to_string()]);
let _ = fs::remove_dir_all(dir);
}
}

View File

@ -0,0 +1,342 @@
//! V2 image AES key 提取 — 平台相关。
//!
//! 路径:
//! - macOS磁盘派生`key_<uin>_*.statistic` 文件名拿 uin → `md5(str(uin) + wxid)[:16]`
//! + brute-force fallback`md5(str(uin))[:4] == wxid_suffix` 枚举 2^24
//! - Windows扫 `Weixin.exe` 内存,匹配 `[a-zA-Z0-9]{32}` 候选,按已知 AES ciphertext-block
//! 反验(`find_image_key.py` / `find_image_key.c` 已写实)
//! - Linux上游空白当前不实现遇到 V2 .dat 返回 unsupported 错误
#[cfg(target_os = "linux")]
pub mod linux;
#[cfg(target_os = "macos")]
pub mod macos;
#[cfg(target_os = "windows")]
pub mod windows;
use anyhow::Result;
use regex::bytes::Regex;
use std::collections::HashSet;
use std::fs;
use std::path::{Path, PathBuf};
use std::sync::OnceLock;
use crate::attachment::decoder::{detect_image_format, V2_MAGIC};
/// V2 图片真正需要的是两份材料:
/// - 16 字节 ASCII AES key
/// - XOR keymacOS 上来自 uin & 0xff不是总能硬编码成 0x88
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct ImageKeyMaterial {
pub aes_key: [u8; 16],
pub xor_key: u8,
}
/// 单个 wxid 的 V2 image key 提取接口。
///
/// 实现者负责跨调用缓存(一台机器上同一 wxid 的 image key 在微信不重启时通常稳定)。
pub trait ImageKeyProvider {
fn get_key(&self, wxid: &str) -> Result<ImageKeyMaterial>;
fn get_aes_key(&self, wxid: &str) -> Result<[u8; 16]> {
Ok(self.get_key(wxid)?.aes_key)
}
fn get_xor_key(&self, wxid: &str) -> Result<u8> {
Ok(self.get_key(wxid)?.xor_key)
}
}
/// 平台默认实现。
pub fn default_provider() -> Option<Box<dyn ImageKeyProvider + Send + Sync>> {
#[cfg(target_os = "macos")]
{
return Some(Box::new(macos::MacosImageKeyProvider::from_current_config()));
}
#[cfg(target_os = "windows")]
{
return Some(Box::new(windows::WindowsImageKeyProvider::from_current_config()));
}
#[cfg(target_os = "linux")]
{
return Some(Box::new(linux::LinuxImageKeyProvider));
}
#[cfg(not(any(target_os = "macos", target_os = "windows", target_os = "linux")))]
{
None
}
}
pub(crate) fn configured_db_dir_for_wxid(configured_db_dir: &Path, requested_wxid: &str) -> PathBuf {
if requested_wxid.trim().is_empty() {
return configured_db_dir.to_path_buf();
}
let configured_leaf = wxid_from_db_dir(configured_db_dir);
if let Some(leaf) = configured_leaf.as_deref() {
if same_wxid(leaf, requested_wxid) {
return configured_db_dir.to_path_buf();
}
}
xwechat_files_root(configured_db_dir)
.map(|root| root.join(requested_wxid).join("db_storage"))
.unwrap_or_else(|| configured_db_dir.to_path_buf())
}
pub(crate) fn wxid_from_db_dir(db_dir: &Path) -> Option<String> {
let mut components = db_dir
.components()
.map(|component| component.as_os_str().to_string_lossy().into_owned());
while let Some(component) = components.next() {
if component == "xwechat_files" {
return components.next();
}
}
None
}
pub(crate) fn xwechat_files_root(db_dir: &Path) -> Option<PathBuf> {
let parts: Vec<_> = db_dir
.components()
.map(|component| component.as_os_str().to_string_lossy().into_owned())
.collect();
let idx = parts.iter().position(|part| part == "xwechat_files")?;
Some(join_components(&parts[..=idx]))
}
pub(crate) fn normalize_wxid(raw: &str) -> String {
let raw = raw.trim();
if raw.is_empty() {
return String::new();
}
if let Some(stripped) = raw.strip_prefix("wxid_") {
let head = stripped.split('_').next().unwrap_or(stripped);
return format!("wxid_{}", head);
}
if let Some((base, suffix)) = raw.rsplit_once('_') {
if suffix.len() == 4 && suffix.bytes().all(|byte| byte.is_ascii_hexdigit()) {
return base.to_string();
}
}
raw.to_string()
}
pub(crate) fn same_wxid(a: &str, b: &str) -> bool {
a == b || normalize_wxid(a) == normalize_wxid(b)
}
pub(crate) fn join_components(parts: &[String]) -> PathBuf {
let mut out = if parts.first().map(|part| part.is_empty()).unwrap_or(false) {
PathBuf::from("/")
} else {
PathBuf::new()
};
for part in parts {
if part.is_empty() {
continue;
}
out.push(part);
}
out
}
pub(crate) fn attach_root_for_db_dir(db_dir: &Path) -> PathBuf {
db_dir
.parent()
.map(|base| base.join("msg").join("attach"))
.unwrap_or_else(|| PathBuf::from("msg/attach"))
}
pub(crate) fn find_v2_template_ciphertexts(
attach_dir: &Path,
max_templates: usize,
max_files: usize,
) -> Result<Vec<[u8; 16]>> {
if !attach_dir.is_dir() {
return Ok(Vec::new());
}
let mut out = collect_templates_with_suffix(attach_dir, "_t.dat", max_templates, max_files)?;
if out.is_empty() {
out = collect_templates_with_suffix(attach_dir, ".dat", max_templates, max_files)?;
}
Ok(out)
}
pub(crate) fn derive_xor_key_from_v2_dat(
attach_dir: &Path,
sample: usize,
min_samples: usize,
) -> Result<Option<(u8, usize, usize)>> {
if !attach_dir.is_dir() {
return Ok(None);
}
let mut votes = Vec::new();
visit_files(attach_dir, &mut |path| -> Result<bool> {
let Some(name) = path.file_name().and_then(|value| value.to_str()) else {
return Ok(false);
};
if !name.ends_with(".dat") {
return Ok(false);
}
let meta = fs::metadata(path)?;
if meta.len() < 0x20 {
return Ok(false);
}
let bytes = fs::read(path)?;
if bytes.starts_with(&V2_MAGIC) {
let last = *bytes.last().unwrap();
votes.push(last ^ 0xD9);
if votes.len() >= sample {
return Ok(true);
}
}
Ok(false)
})?;
if votes.len() < min_samples {
return Ok(None);
}
let mut counts = [0usize; 256];
for vote in &votes {
counts[*vote as usize] += 1;
}
let (xor_key, top_votes) = counts
.iter()
.enumerate()
.max_by_key(|(_, count)| *count)
.map(|(idx, count)| (idx as u8, *count))
.expect("votes 非空");
Ok(Some((xor_key, top_votes, votes.len())))
}
pub(crate) fn verify_aes_key(aes_key: &[u8; 16], templates: &[[u8; 16]]) -> bool {
!templates.is_empty()
&& templates
.iter()
.all(|template| decrypt_template_block(aes_key, template).is_some())
}
pub(crate) fn ascii_alnum_candidates<'a>(buf: &'a [u8], len: usize) -> Vec<&'a [u8]> {
let re = match len {
16 => regex16(),
32 => regex32(),
_ => return Vec::new(),
};
re.find_iter(buf)
.filter_map(|matched| {
let start = matched.start();
let end = matched.end();
let left_ok = start == 0 || !buf[start - 1].is_ascii_alphanumeric();
let right_ok = end == buf.len() || !buf[end].is_ascii_alphanumeric();
(left_ok && right_ok).then_some(&buf[start..end])
})
.collect()
}
fn collect_templates_with_suffix(
dir: &Path,
suffix: &str,
max_templates: usize,
max_files: usize,
) -> Result<Vec<[u8; 16]>> {
let mut out = Vec::new();
let mut seen = HashSet::new();
let mut examined = 0usize;
visit_files(dir, &mut |path| -> Result<bool> {
let Some(name) = path.file_name().and_then(|value| value.to_str()) else {
return Ok(false);
};
if !name.ends_with(suffix) {
return Ok(false);
}
examined += 1;
let bytes = fs::read(path)?;
if bytes.len() >= 0x1F && bytes.starts_with(&V2_MAGIC) {
let template: [u8; 16] = bytes[0x0F..0x1F].try_into().unwrap();
if seen.insert(template) {
out.push(template);
if out.len() >= max_templates {
return Ok(true);
}
}
}
Ok(examined >= max_files && !out.is_empty())
})?;
Ok(out)
}
fn visit_files<F>(dir: &Path, f: &mut F) -> Result<bool>
where
F: FnMut(&Path) -> Result<bool>,
{
let mut entries: Vec<PathBuf> = fs::read_dir(dir)?
.flatten()
.map(|entry| entry.path())
.collect();
entries.sort();
for path in entries {
if path.is_dir() {
if visit_files(&path, f)? {
return Ok(true);
}
continue;
}
if f(&path)? {
return Ok(true);
}
}
Ok(false)
}
fn decrypt_template_block(aes_key: &[u8; 16], ciphertext: &[u8; 16]) -> Option<&'static str> {
use aes::cipher::{generic_array::GenericArray, BlockDecrypt, KeyInit};
let cipher = aes::Aes128::new(aes_key.into());
let mut block = GenericArray::clone_from_slice(ciphertext);
cipher.decrypt_block(&mut block);
let block: [u8; 16] = block.as_slice().try_into().ok()?;
let format = detect_image_format(&block);
(format != "bin").then_some(format)
}
fn regex16() -> &'static Regex {
static RE: OnceLock<Regex> = OnceLock::new();
RE.get_or_init(|| Regex::new(r"[A-Za-z0-9]{16}").unwrap())
}
fn regex32() -> &'static Regex {
static RE: OnceLock<Regex> = OnceLock::new();
RE.get_or_init(|| Regex::new(r"[A-Za-z0-9]{32}").unwrap())
}
#[cfg(test)]
mod tests {
use super::{ascii_alnum_candidates, normalize_wxid, same_wxid};
#[test]
fn regex_candidates_respect_boundaries() {
let buf = b"xx 0123456789ABCDef yy";
let hits = ascii_alnum_candidates(buf, 16);
assert_eq!(hits, vec![&buf[3..19]]);
}
#[test]
fn regex_candidates_ignore_embedded_runs() {
let buf = b"x0123456789ABCDefz";
assert!(ascii_alnum_candidates(buf, 16).is_empty());
}
#[test]
fn wxid_normalization_matches_expected_forms() {
assert_eq!(normalize_wxid("wxid_abc_def"), "wxid_abc");
assert_eq!(normalize_wxid("your_wxid_a1b2"), "your_wxid");
assert!(same_wxid("your_wxid_a1b2", "your_wxid"));
}
}

View File

@ -0,0 +1,238 @@
//! Windows V2 image AES key 提取。
//!
//! 扫 `Weixin.exe` 进程内存,匹配模式 `[A-Za-z0-9]{32}` / `[A-Za-z0-9]{16}`
//! 然后用 V2 模板 AES block 反验,控制 false positive。
use anyhow::{bail, Context, Result};
use std::collections::{HashMap, HashSet};
use std::path::PathBuf;
use std::sync::Mutex;
use windows::Win32::Foundation::{CloseHandle, HANDLE};
use windows::Win32::System::Diagnostics::Debug::ReadProcessMemory;
use windows::Win32::System::Diagnostics::ToolHelp::{
CreateToolhelp32Snapshot, Process32First, Process32Next, PROCESSENTRY32, TH32CS_SNAPPROCESS,
};
use windows::Win32::System::Memory::{
VirtualQueryEx, MEMORY_BASIC_INFORMATION, MEM_COMMIT, PAGE_EXECUTE_READWRITE,
PAGE_EXECUTE_WRITECOPY, PAGE_GUARD, PAGE_NOCACHE, PAGE_NOACCESS, PAGE_READWRITE,
PAGE_WRITECOMBINE, PAGE_WRITECOPY,
};
use windows::Win32::System::Threading::{OpenProcess, PROCESS_QUERY_INFORMATION, PROCESS_VM_READ};
use crate::config;
use super::{
ascii_alnum_candidates, attach_root_for_db_dir, configured_db_dir_for_wxid,
derive_xor_key_from_v2_dat, find_v2_template_ciphertexts, verify_aes_key, ImageKeyMaterial,
ImageKeyProvider,
};
const CHUNK_SIZE: usize = 2 * 1024 * 1024;
const MAX_REGION_SIZE: usize = 50 * 1024 * 1024;
pub struct WindowsImageKeyProvider {
configured_db_dir: Result<PathBuf, String>,
cache: Mutex<HashMap<String, ImageKeyMaterial>>,
}
impl WindowsImageKeyProvider {
pub fn from_current_config() -> Self {
let configured_db_dir = config::load_config()
.map(|cfg| cfg.db_dir)
.map_err(|err| err.to_string());
Self {
configured_db_dir,
cache: Mutex::new(HashMap::new()),
}
}
}
impl ImageKeyProvider for WindowsImageKeyProvider {
fn get_key(&self, wxid: &str) -> Result<ImageKeyMaterial> {
let cache_key = wxid.trim().to_string();
if let Some(found) = self.cache.lock().unwrap().get(&cache_key).copied() {
return Ok(found);
}
let configured_db_dir = self
.configured_db_dir
.as_ref()
.map_err(|err| anyhow::anyhow!("读取 config.db_dir 失败: {}", err))?;
let db_dir = configured_db_dir_for_wxid(configured_db_dir, wxid);
let attach_dir = attach_root_for_db_dir(&db_dir);
let key = derive_key_for_paths(&attach_dir)?;
self.cache.lock().unwrap().insert(cache_key, key);
Ok(key)
}
}
fn derive_key_for_paths(attach_dir: &std::path::Path) -> Result<ImageKeyMaterial> {
let templates = find_v2_template_ciphertexts(attach_dir, 3, 64)?;
if templates.is_empty() {
bail!("在 {} 下找不到 V2 模板文件", attach_dir.display());
}
let xor_key = derive_xor_key_from_v2_dat(attach_dir, 10, 3)?
.map(|(key, _, _)| key)
.unwrap_or(0x88);
let pid = find_wechat_pid().context("找不到 Weixin.exe 进程,请确认微信正在运行")?;
let process = unsafe {
OpenProcess(PROCESS_VM_READ | PROCESS_QUERY_INFORMATION, false, pid)
.context("OpenProcess 失败,请以管理员权限运行")?
};
let aes_key = scan_memory_for_key(process, &templates);
unsafe {
let _ = CloseHandle(process);
}
Ok(ImageKeyMaterial {
aes_key: aes_key?,
xor_key,
})
}
fn find_wechat_pid() -> Option<u32> {
let snapshot = unsafe { CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0).ok()? };
let mut entry = PROCESSENTRY32 {
dwSize: std::mem::size_of::<PROCESSENTRY32>() as u32,
..Default::default()
};
unsafe {
if Process32First(snapshot, &mut entry).is_err() {
let _ = CloseHandle(snapshot);
return None;
}
loop {
let name =
std::ffi::CStr::from_ptr(entry.szExeFile.as_ptr() as *const i8).to_string_lossy();
if name.eq_ignore_ascii_case("Weixin.exe") {
let pid = entry.th32ProcessID;
let _ = CloseHandle(snapshot);
return Some(pid);
}
if Process32Next(snapshot, &mut entry).is_err() {
break;
}
}
let _ = CloseHandle(snapshot);
}
None
}
fn scan_memory_for_key(process: HANDLE, templates: &[[u8; 16]]) -> Result<[u8; 16]> {
let mut seen = HashSet::<[u8; 16]>::new();
let mut address = 0usize;
loop {
let mut mbi = MEMORY_BASIC_INFORMATION::default();
let ret = unsafe {
VirtualQueryEx(
process,
Some(address as *const _),
&mut mbi,
std::mem::size_of::<MEMORY_BASIC_INFORMATION>(),
)
};
if ret == 0 {
break;
}
let base = mbi.BaseAddress as usize;
let size = mbi.RegionSize;
if mbi.State == MEM_COMMIT && is_candidate_page(mbi.Protect.0) && size <= MAX_REGION_SIZE {
if let Some(aes_key) = scan_region(process, base, size, templates, &mut seen)? {
return Ok(aes_key);
}
}
address = base.saturating_add(size);
if address == 0 {
break;
}
}
bail!("Windows 进程内存里没有找到可验证的 V2 AES key")
}
fn scan_region(
process: HANDLE,
base: usize,
size: usize,
templates: &[[u8; 16]],
seen: &mut HashSet<[u8; 16]>,
) -> Result<Option<[u8; 16]>> {
let overlap = 31usize;
let mut offset = 0usize;
while offset < size {
let chunk_size = std::cmp::min(CHUNK_SIZE, size - offset);
let addr = base + offset;
let mut buf = vec![0u8; chunk_size];
let mut bytes_read = 0usize;
let ok = unsafe {
ReadProcessMemory(
process,
addr as *const _,
buf.as_mut_ptr() as *mut _,
chunk_size,
Some(&mut bytes_read),
)
.is_ok()
};
if ok && bytes_read > 0 {
buf.truncate(bytes_read);
if let Some(key) = scan_candidate_buffer(&buf, templates, seen) {
return Ok(Some(key));
}
}
offset += if chunk_size > overlap {
chunk_size - overlap
} else {
chunk_size
};
}
Ok(None)
}
fn scan_candidate_buffer(
buf: &[u8],
templates: &[[u8; 16]],
seen: &mut HashSet<[u8; 16]>,
) -> Option<[u8; 16]> {
for candidate in ascii_alnum_candidates(buf, 32) {
let mut key = [0u8; 16];
key.copy_from_slice(&candidate[..16]);
if seen.insert(key) && verify_aes_key(&key, templates) {
return Some(key);
}
}
for candidate in ascii_alnum_candidates(buf, 16) {
let mut key = [0u8; 16];
key.copy_from_slice(candidate);
if seen.insert(key) && verify_aes_key(&key, templates) {
return Some(key);
}
}
None
}
fn is_candidate_page(protect: u32) -> bool {
if protect == PAGE_NOACCESS.0 || (protect & PAGE_GUARD.0) != 0 {
return false;
}
let base = protect & !(PAGE_GUARD.0 | PAGE_NOCACHE.0 | PAGE_WRITECOMBINE.0);
matches!(
base,
value if value == PAGE_READWRITE.0
|| value == PAGE_WRITECOPY.0
|| value == PAGE_EXECUTE_READWRITE.0
|| value == PAGE_EXECUTE_WRITECOPY.0
)
}

View File

@ -0,0 +1,28 @@
//! 聊天附件提取链路(图片 / 视频 / 语音 / 文件本体的本地解码)
//!
//! 整条链:
//! message_N.db (Msg_<md5>) → message_resource.db (ChatName2Id + MessageResourceInfo)
//! → packed_info protobuf md5 提取 → xwechat_files/<wxid>/msg/attach/.../Img/<md5>[_t|_h].dat
//! → magic 分发 (legacy XOR / V1 fixed-AES / V2 AES+XOR) → 写出实际图片
//!
//! 模块切分:
//! - `attachment_id`:跨 IPC / CLI 的不透明 IDbase64url(json)
//! - `resolver`:从 `attachment_id` 反查 message_resource.db定位本地 .dat
//! - `decoder`:根据文件 magic 分发到具体解码器V1 / V2 等)
//! - `image_key`V2 image AES key 提取macOS / Windows
//!
//! V2 / image_key 模块由 codex 落地,先放空 stub 以便 V1 / resolver / CLI 不被 block。
// 此模块由分多个 PR/commit 增量启用:
// 1) 先落 attachment_id / decoder / resolver / image_key 骨架(本 commit
// 2) IPC + CLI + daemon route 把它们串起来(后续 commit
// 3) image_key 平台实现codex 后续 commit
// 在 step 1 完成、step 2 未到时,大量公开 API 仍未被引用,#[allow(dead_code)] 抑制噪音
#![allow(dead_code)]
pub mod attachment_id;
pub mod decoder;
pub mod resolver;
pub mod image_key;
pub use attachment_id::{AttachmentId, AttachmentKind};

View File

@ -0,0 +1,439 @@
//! 把 `AttachmentId` 翻译成本地 `.dat` 路径。
//!
//! 流程:
//! 1. `chat` username → `ChatName2Id.rowid`(资源库)
//! 2. `(chat_id, local_id)` + `ORDER BY message_create_time DESC LIMIT 1` →
//! `MessageResourceInfo.packed_info`
//! 3. 从 `packed_info` (protobuf) 提取 32 字节 ASCII hex MD5
//! 4. 在 `<wxchat_base>/msg/attach/<md5(chat)>/<YYYY-MM>/Img/<md5>[_t|_h].dat`
//! 下找对应文件,按 full > _h > _t 优先级选一个
//!
//! `<wxchat_base>` 由 daemon 已知(同 `db_dir` 的父目录),路径 layout 平台差异:
//! - Linux: `~/Documents/xwechat_files/<wxid>`
//! - macOS: `~/Library/Containers/com.tencent.xinWeChat/Data/Documents/xwechat_files/<wxid>`
//! ⚠️ msg/attach/... 子树 layout 待我用真实账号验证;上游 docstring 只写了 Windows
//! - Windows: `<root>\xwechat_files\<wxid>`root 从 `%APPDATA%\Tencent\xwechat\config\*.ini` 读)
use anyhow::{anyhow, Context, Result};
use chrono::TimeZone;
use rusqlite::Connection;
use std::path::{Path, PathBuf};
use super::AttachmentId;
/// 单条 attachment 在资源库 + 本地 attach 树下的解析结果。
#[derive(Debug, Clone)]
pub struct ResolvedAttachment {
pub id: AttachmentId,
/// 从 `packed_info` 提取出的资源 MD5小写 hex
pub md5: String,
/// 命中的本地 .dat 路径(按 full > _h > _t 优先级选一个)
pub dat_path: PathBuf,
/// 文件 size字节
pub size: u64,
}
/// 仅 schema lookup不去找本地 .dat
/// 用于 `wx attachments` 列表时填 `md5` 字段——文件可能根本不在本地。
#[derive(Debug, Clone)]
pub struct AttachmentMetadata {
pub md5: String,
}
/// 用 `(chat, local_id)` 查 message_resource.db 拿 file md5。
///
/// 调用方传已经解密好的 `message_resource.db` 路径(由 daemon 的 `DBCache` 准备)。
/// 同步函数 — caller 在 `spawn_blocking` 里跑。
pub fn lookup_md5_blocking(
resource_db_path: &Path,
chat: &str,
local_id: i64,
create_time: i64,
msg_local_type_lo32: i64,
) -> Result<Option<AttachmentMetadata>> {
let conn = Connection::open_with_flags(
resource_db_path,
rusqlite::OpenFlags::SQLITE_OPEN_READ_ONLY | rusqlite::OpenFlags::SQLITE_OPEN_URI,
)
.with_context(|| format!("打开 message_resource.db {:?}", resource_db_path))?;
// 1) ChatName2Id: user_name -> rowid
let chat_id: Option<i64> = conn
.query_row(
"SELECT rowid FROM ChatName2Id WHERE user_name = ?1",
[chat],
|row| row.get(0),
)
.ok();
let Some(chat_id) = chat_id else {
return Ok(None);
};
// 2) MessageResourceInfo:
// 同 chat 内 local_id 会复用,所以先用 create_time 精确命中;
// 若资源库里的时间戳跟 message_N.db 不完全对齐,再 fallback 到“同 local_id/type 取最新”
// message_local_type 高 32 bit 是版本/会话 flag低 32 bit 才是真实类型
let packed_exact: Option<Vec<u8>> = conn
.query_row(
"SELECT packed_info FROM MessageResourceInfo
WHERE chat_id = ?1
AND message_local_id = ?2
AND (message_local_type = ?3 OR message_local_type % 4294967296 = ?3)
AND message_create_time = ?4
ORDER BY rowid DESC
LIMIT 1",
rusqlite::params![chat_id, local_id, msg_local_type_lo32, create_time],
|row| row.get(0),
)
.ok();
let packed: Option<Vec<u8>> = packed_exact.or_else(|| conn
.query_row(
"SELECT packed_info FROM MessageResourceInfo
WHERE chat_id = ?1
AND message_local_id = ?2
AND (message_local_type = ?3 OR message_local_type % 4294967296 = ?3)
ORDER BY message_create_time DESC
LIMIT 1",
rusqlite::params![chat_id, local_id, msg_local_type_lo32],
|row| row.get(0),
)
.ok());
let Some(blob) = packed else {
return Ok(None);
};
Ok(extract_md5_from_packed_info(&blob).map(|md5| AttachmentMetadata { md5 }))
}
/// 从 `MessageResourceInfo.packed_info` (protobuf) 提取 32 字节 ASCII hex md5。
///
/// 主路径:搜 4 字节 marker `12 22 0a 20`field=2 LEN, length=34, sub field=1 LEN, length=32
/// 紧跟 32 字节 ASCII hex。
/// Fallback扫整个 blob 找连续 32 字节合法 hex 字符。
pub fn extract_md5_from_packed_info(blob: &[u8]) -> Option<String> {
const MARKER: &[u8; 4] = &[0x12, 0x22, 0x0A, 0x20];
// 主路径
if let Some(pos) = find_subslice(blob, MARKER) {
let start = pos + MARKER.len();
if start + 32 <= blob.len() {
if let Ok(s) = std::str::from_utf8(&blob[start..start + 32]) {
if s.chars().all(|c| c.is_ascii_hexdigit()) {
return Some(s.to_ascii_lowercase());
}
}
}
}
// Fallback连续 32 字节合法 hex
if blob.len() >= 32 {
for start in 0..=blob.len() - 32 {
let chunk = &blob[start..start + 32];
if let Ok(s) = std::str::from_utf8(chunk) {
if s.chars().all(|c| c.is_ascii_hexdigit()) {
return Some(s.to_ascii_lowercase());
}
}
}
}
None
}
/// 简单的子串扫描(避免拉 memchr/memmem 依赖blob 通常 < 1KB
fn find_subslice(haystack: &[u8], needle: &[u8]) -> Option<usize> {
if needle.is_empty() || needle.len() > haystack.len() {
return None;
}
haystack
.windows(needle.len())
.position(|w| w == needle)
}
/// 在 `<attach_root>/<md5(chat)>/<YYYY-MM>/Img/<md5>[_t|_h].dat` 下找文件。
///
/// 优先级full > `_h`HD thumbnail> `_t`thumbnail。返回最优的一个
/// 找不到返回 None。
///
/// `attach_root` = `<wxchat_base>/msg/attach`。
/// `create_time` 用于先定位 `<YYYY-MM>` 子目录;找不到时再 fallback 全月份扫描,
/// 因为 WeChat 的 `YYYY-MM` 目录有时跟消息时间差 1 个月(按收到时间归档)。
pub fn find_dat_file(
attach_root: &Path,
chat: &str,
file_md5: &str,
create_time: i64,
) -> Option<PathBuf> {
let chat_hash = format!("{:x}", md5::compute(chat.as_bytes()));
let chat_dir = attach_root.join(&chat_hash);
if !chat_dir.is_dir() {
return None;
}
// 第一步:试 create_time 当月 + 前后各一个月(共 3 个候选目录)
let candidates_ym: Vec<String> = three_month_candidates(create_time);
for ym in &candidates_ym {
let img_dir = chat_dir.join(ym).join("Img");
if let Some(p) = pick_best_in_img_dir(&img_dir, file_md5) {
return Some(p);
}
}
// 第二步 fallback扫整个 chat_dir 的所有月份子目录
let entries = std::fs::read_dir(&chat_dir).ok()?;
let mut all_months: Vec<PathBuf> = entries
.filter_map(|e| e.ok())
.map(|e| e.path())
.filter(|p| p.is_dir())
.collect();
// 已经试过的 3 个候选可以跳过,但成本极小;保留全量扫
all_months.sort();
for month_dir in all_months {
let img_dir = month_dir.join("Img");
if let Some(p) = pick_best_in_img_dir(&img_dir, file_md5) {
return Some(p);
}
}
None
}
fn pick_best_in_img_dir(img_dir: &Path, file_md5: &str) -> Option<PathBuf> {
if !img_dir.is_dir() {
return None;
}
let full = img_dir.join(format!("{}.dat", file_md5));
if full.is_file() {
return Some(full);
}
let hd = img_dir.join(format!("{}_h.dat", file_md5));
if hd.is_file() {
return Some(hd);
}
let thumb = img_dir.join(format!("{}_t.dat", file_md5));
if thumb.is_file() {
return Some(thumb);
}
None
}
fn three_month_candidates(unix_ts: i64) -> Vec<String> {
use chrono::{Datelike, Duration};
let dt = match chrono::Local.timestamp_opt(unix_ts, 0).single() {
Some(d) => d,
None => return Vec::new(),
};
let prev = dt - Duration::days(31);
let next = dt + Duration::days(31);
[prev, dt, next]
.iter()
.map(|d| format!("{:04}-{:02}", d.year(), d.month()))
.collect()
}
/// 把 `<wxchat_base>` (即 `db_storage` 父目录)拼成 `<base>/msg/attach`。
pub fn attach_root_for(wxchat_base: &Path) -> PathBuf {
wxchat_base.join("msg").join("attach")
}
/// 完整流程:用 `attachment_id` 拿 md5 + 找 .dat。失败返回带具体诊断信息的 `Err`。
///
/// `resource_db_path` 由 daemon 提供DBCache 已经解密好);
/// `attach_root` 由 caller 拼好(`attach_root_for(wxchat_base)`)。
/// 同步函数 — caller 在 `spawn_blocking` 里跑。
pub fn resolve_blocking(
id: &AttachmentId,
resource_db_path: &Path,
attach_root: &Path,
) -> Result<ResolvedAttachment> {
let lo32_type: i64 = match id.kind {
super::AttachmentKind::Image => 3,
super::AttachmentKind::Voice => 34,
super::AttachmentKind::Video => 43,
super::AttachmentKind::File => 49,
};
let meta = lookup_md5_blocking(
resource_db_path,
&id.chat,
id.local_id,
id.create_time,
lo32_type,
)?
.ok_or_else(|| {
anyhow!(
"message_resource.db 中找不到 chat={} local_id={} type={} 的资源行(可能是非附件消息或资源库未同步)",
id.chat,
id.local_id,
lo32_type
)
})?;
let dat_path = find_dat_file(attach_root, &id.chat, &meta.md5, id.create_time).ok_or_else(
|| {
anyhow!(
"找不到本地 .datmd5={} chat={} create_time={})— 微信可能尚未下载该附件,或附件已被清理",
meta.md5,
id.chat,
id.create_time
)
},
)?;
let size = std::fs::metadata(&dat_path).map(|m| m.len()).unwrap_or(0);
Ok(ResolvedAttachment { id: id.clone(), md5: meta.md5, dat_path, size })
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn extract_md5_main_path() {
// 构造一段含 12 22 0a 20 marker 的 blob
let mut blob = vec![0xAA, 0xBB, 0xCC];
blob.extend_from_slice(&[0x12, 0x22, 0x0A, 0x20]);
blob.extend_from_slice(b"deadbeefcafebabe1234567890abcdef");
blob.extend_from_slice(&[0xFF, 0xFF]);
assert_eq!(
extract_md5_from_packed_info(&blob),
Some("deadbeefcafebabe1234567890abcdef".to_string())
);
}
#[test]
fn extract_md5_fallback_no_marker() {
// 没有 marker但 blob 里有合法 32 字节 hex
let mut blob = vec![0xFF, 0x00];
blob.extend_from_slice(b"00112233445566778899aabbccddeeff");
blob.extend_from_slice(&[0x01]);
assert_eq!(
extract_md5_from_packed_info(&blob),
Some("00112233445566778899aabbccddeeff".to_string())
);
}
#[test]
fn extract_md5_uppercase_normalized_to_lower() {
let mut blob = vec![0x12, 0x22, 0x0A, 0x20];
blob.extend_from_slice(b"DEADBEEFCAFEBABE1234567890ABCDEF");
// 上游/CI/本地 file md5 都是 lowercase强制小写化避免大小写不一致导致命中失败
assert_eq!(
extract_md5_from_packed_info(&blob),
Some("deadbeefcafebabe1234567890abcdef".to_string())
);
}
#[test]
fn extract_md5_returns_none_on_garbage() {
let blob = vec![0; 16];
assert!(extract_md5_from_packed_info(&blob).is_none());
}
#[test]
fn lookup_md5_prefers_exact_create_time_over_latest_reuse() {
let dir = tempdir_for_test();
let db_path = dir.join("message_resource.db");
let conn = Connection::open(&db_path).unwrap();
conn.execute(
"CREATE TABLE ChatName2Id (user_name TEXT)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO ChatName2Id (rowid, user_name) VALUES (1, 'room@chatroom')",
[],
)
.unwrap();
conn.execute(
"CREATE TABLE MessageResourceInfo (
chat_id INTEGER,
message_local_id INTEGER,
message_local_type INTEGER,
message_create_time INTEGER,
packed_info BLOB
)",
[],
)
.unwrap();
let old_blob = {
let mut blob = vec![0x12, 0x22, 0x0A, 0x20];
blob.extend_from_slice(b"11111111111111111111111111111111");
blob
};
let new_blob = {
let mut blob = vec![0x12, 0x22, 0x0A, 0x20];
blob.extend_from_slice(b"22222222222222222222222222222222");
blob
};
conn.execute(
"INSERT INTO MessageResourceInfo
(chat_id, message_local_id, message_local_type, message_create_time, packed_info)
VALUES (?1, ?2, ?3, ?4, ?5)",
rusqlite::params![1i64, 7i64, 3i64, 1000i64, old_blob],
)
.unwrap();
conn.execute(
"INSERT INTO MessageResourceInfo
(chat_id, message_local_id, message_local_type, message_create_time, packed_info)
VALUES (?1, ?2, ?3, ?4, ?5)",
rusqlite::params![1i64, 7i64, 3i64, 2000i64, new_blob],
)
.unwrap();
let old = lookup_md5_blocking(&db_path, "room@chatroom", 7, 1000, 3)
.unwrap()
.unwrap();
let new = lookup_md5_blocking(&db_path, "room@chatroom", 7, 2000, 3)
.unwrap()
.unwrap();
assert_eq!(old.md5, "11111111111111111111111111111111");
assert_eq!(new.md5, "22222222222222222222222222222222");
}
#[test]
fn three_month_candidates_includes_prev_curr_next() {
// 2025-08-15 (mid-month) → 2025-07, 2025-08, 2025-09
let ts = chrono::Local
.with_ymd_and_hms(2025, 8, 15, 12, 0, 0)
.unwrap()
.timestamp();
let v = three_month_candidates(ts);
assert!(v.contains(&"2025-07".to_string()));
assert!(v.contains(&"2025-08".to_string()));
assert!(v.contains(&"2025-09".to_string()));
}
#[test]
fn pick_best_prefers_full_then_h_then_t() {
let tmp = tempdir_for_test();
let img = tmp.join("Img");
std::fs::create_dir_all(&img).unwrap();
let md5 = "abcd1234";
std::fs::write(img.join(format!("{}_t.dat", md5)), b"thumb").unwrap();
std::fs::write(img.join(format!("{}_h.dat", md5)), b"hd").unwrap();
// 只有 _t / _h 时取 _h
assert_eq!(
pick_best_in_img_dir(&img, md5).unwrap().file_name().unwrap(),
format!("{}_h.dat", md5).as_str()
);
// 加 full 后取 full
std::fs::write(img.join(format!("{}.dat", md5)), b"full").unwrap();
assert_eq!(
pick_best_in_img_dir(&img, md5).unwrap().file_name().unwrap(),
format!("{}.dat", md5).as_str()
);
}
fn tempdir_for_test() -> PathBuf {
let pid = std::process::id();
let nanos = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_nanos();
let p = std::env::temp_dir().join(format!("wx-cli-attach-test-{}-{}", pid, nanos));
std::fs::create_dir_all(&p).unwrap();
p
}
}

View File

@ -0,0 +1,41 @@
use anyhow::Result;
use super::history::{parse_time, parse_time_end};
use super::output::{emit_warnings, print_response, OutputOpts};
use super::transport;
use crate::ipc::Request;
/// `wx attachments` — 列出指定会话的附件消息(默认 image可多选
///
/// 输出每条 `attachment_id`,再传给 `wx extract` 才真正读 message_resource.db
/// 与本地 .dat 解码。这一步只查 `Msg_<chat>` 表,几千条群聊也能秒返。
pub fn cmd_attachments(
chat: String,
kinds: Vec<String>,
limit: usize,
offset: usize,
since: Option<String>,
until: Option<String>,
opts: OutputOpts,
) -> Result<()> {
let since_ts = since.as_deref().map(parse_time).transpose()?;
let until_ts = until.as_deref().map(parse_time_end).transpose()?;
let (with_meta, debug_source) = opts.request_flags();
// CLI 收上来的 Vec<String> 为空时按默认image让 daemon 决定 fallback。
let kinds_param = if kinds.is_empty() { None } else { Some(kinds) };
let req = Request::Attachments {
chat,
kinds: kinds_param,
limit,
offset,
since: since_ts,
until: until_ts,
with_meta,
debug_source,
};
let resp = transport::send(req)?;
emit_warnings(&resp.data);
print_response(&resp.data, &opts)
}

View File

@ -1,7 +1,8 @@
use anyhow::Result;
use crate::ipc::Request;
use super::transport;
use super::history::{parse_time, parse_time_end}; use super::history::{parse_time, parse_time_end};
use super::output::{emit_warnings, warning_block_markdown, warning_block_text, OutputOpts};
use super::transport;
use crate::ipc::Request;
use anyhow::Result;
pub fn cmd_export( pub fn cmd_export(
chat: String, chat: String,
@ -10,9 +11,11 @@ pub fn cmd_export(
limit: usize, limit: usize,
format: String, format: String,
output: Option<String>, output: Option<String>,
opts: OutputOpts,
) -> Result<()> { ) -> Result<()> {
let since_ts = since.as_deref().map(parse_time).transpose()?; let since_ts = since.as_deref().map(parse_time).transpose()?;
let until_ts = until.as_deref().map(parse_time_end).transpose()?; let until_ts = until.as_deref().map(parse_time_end).transpose()?;
let (with_meta, debug_source) = opts.request_flags();
let req = Request::History { let req = Request::History {
chat, chat,
@ -21,24 +24,42 @@ pub fn cmd_export(
since: since_ts, since: since_ts,
until: until_ts, until: until_ts,
msg_type: None, msg_type: None,
with_meta,
debug_source,
}; };
let resp = transport::send(req)?; let resp = transport::send(req)?;
let messages = resp.data["messages"].as_array().cloned().unwrap_or_default(); emit_warnings(&resp.data);
let messages = resp.data["messages"]
.as_array()
.cloned()
.unwrap_or_default();
let chat_name = resp.data["chat"].as_str().unwrap_or("").to_string(); let chat_name = resp.data["chat"].as_str().unwrap_or("").to_string();
let is_group = resp.data["is_group"].as_bool().unwrap_or(false); let is_group = resp.data["is_group"].as_bool().unwrap_or(false);
let count = messages.len(); let count = messages.len();
let text = match format.as_str() { let text = match format.as_str() {
"json" => serde_json::to_string_pretty(&resp.data)?, "json" => serde_json::to_string_pretty(&resp.data)?,
"yaml" => serde_yaml::to_string(&resp.data)?,
"txt" => { "txt" => {
let group_str = if is_group { "[群]" } else { "" }; let group_str = if is_group { "[群]" } else { "" };
let mut lines = vec![format!("=== {}{} ({} 条) ===\n", chat_name, group_str, count)]; let mut lines = vec![format!(
"=== {}{} ({} 条) ===\n",
chat_name, group_str, count
)];
if let Some(warn) = warning_block_text(&resp.data) {
lines.push(warn);
lines.push(String::new());
}
for m in &messages { for m in &messages {
let time = m["time"].as_str().unwrap_or(""); let time = m["time"].as_str().unwrap_or("");
let sender = m["sender"].as_str().unwrap_or(""); let sender = m["sender"].as_str().unwrap_or("");
let content = m["content"].as_str().unwrap_or(""); let content = m["content"].as_str().unwrap_or("");
let sender_str = if !sender.is_empty() { format!("{}: ", sender) } else { String::new() }; let sender_str = if !sender.is_empty() {
format!("{}: ", sender)
} else {
String::new()
};
lines.push(format!("[{}] {}{}", time, sender_str, content)); lines.push(format!("[{}] {}{}", time, sender_str, content));
} }
lines.join("\n") lines.join("\n")
@ -50,11 +71,18 @@ pub fn cmd_export(
format!("# {}{}", chat_name, group_str), format!("# {}{}", chat_name, group_str),
format!("\n> 导出 {} 条消息\n", count), format!("\n> 导出 {} 条消息\n", count),
]; ];
if let Some(warn) = warning_block_markdown(&resp.data) {
lines.push(warn);
}
for m in &messages { for m in &messages {
let time = m["time"].as_str().unwrap_or(""); let time = m["time"].as_str().unwrap_or("");
let sender = m["sender"].as_str().unwrap_or(""); let sender = m["sender"].as_str().unwrap_or("");
let content = m["content"].as_str().unwrap_or("").replace('\n', "\n> "); let content = m["content"].as_str().unwrap_or("").replace('\n', "\n> ");
let sender_md = if !sender.is_empty() { format!("**{}**: ", sender) } else { String::new() }; let sender_md = if !sender.is_empty() {
format!("**{}**: ", sender)
} else {
String::new()
};
lines.push(format!("### {}\n\n{}{}\n", time, sender_md, content)); lines.push(format!("### {}\n\n{}{}\n", time, sender_md, content));
} }
lines.join("\n") lines.join("\n")

25
src/cli/extract.rs 100644
View File

@ -0,0 +1,25 @@
use anyhow::Result;
use crate::ipc::Request;
use super::output::{print_value, resolve};
use super::transport;
/// `wx extract` — 把单个 `attachment_id` 对应的资源解密写到指定路径。
///
/// daemon 端:解析 `attachment_id` → 查 `message_resource.db` 拿 file md5 →
/// 在 `<wxchat_base>/msg/attach/...` 找 .dat → 按 magic 分发到 v1/v2 解码器 →
/// 写出真实图片/文件。
pub fn cmd_extract(
attachment_id: String,
output: String,
overwrite: bool,
json: bool,
) -> Result<()> {
let req = Request::Extract {
attachment_id,
output,
overwrite,
};
let resp = transport::send(req)?;
print_value(&resp.data, &resolve(json))
}

View File

@ -1,7 +1,7 @@
use anyhow::Result; use super::output::{emit_warnings, print_response, OutputOpts};
use crate::ipc::Request;
use super::transport; use super::transport;
use super::output::{resolve, print_value}; use crate::ipc::Request;
use anyhow::Result;
pub fn cmd_history( pub fn cmd_history(
chat: String, chat: String,
@ -10,37 +10,51 @@ pub fn cmd_history(
since: Option<String>, since: Option<String>,
until: Option<String>, until: Option<String>,
msg_type: Option<String>, msg_type: Option<String>,
json: bool, opts: OutputOpts,
) -> Result<()> { ) -> Result<()> {
let since_ts = since.as_deref().map(parse_time).transpose()?; let since_ts = since.as_deref().map(parse_time).transpose()?;
let until_ts = until.as_deref().map(parse_time_end).transpose()?; let until_ts = until.as_deref().map(parse_time_end).transpose()?;
let type_val = msg_type.as_deref().and_then(parse_msg_type); let type_val = msg_type.as_deref().and_then(parse_msg_type);
let (with_meta, debug_source) = opts.request_flags();
let req = Request::History { chat, limit, offset, since: since_ts, until: until_ts, msg_type: type_val }; let req = Request::History {
chat,
limit,
offset,
since: since_ts,
until: until_ts,
msg_type: type_val,
with_meta,
debug_source,
};
let resp = transport::send(req)?; let resp = transport::send(req)?;
emit_warnings(&resp.data);
let msgs = resp.data.get("messages") print_response(&resp.data, &opts)
.cloned()
.unwrap_or(serde_json::Value::Array(vec![]));
print_value(&msgs, &resolve(json))
} }
pub fn parse_time(s: &str) -> Result<i64> { pub fn parse_time(s: &str) -> Result<i64> {
use chrono::{Local, TimeZone}; use chrono::{Local, TimeZone};
for fmt in &["%Y-%m-%d %H:%M:%S", "%Y-%m-%d %H:%M"] { for fmt in &["%Y-%m-%d %H:%M:%S", "%Y-%m-%d %H:%M"] {
if let Ok(dt) = chrono::NaiveDateTime::parse_from_str(s, fmt) { if let Ok(dt) = chrono::NaiveDateTime::parse_from_str(s, fmt) {
return Local.from_local_datetime(&dt).single() return Local
.from_local_datetime(&dt)
.single()
.map(|d| d.timestamp()) .map(|d| d.timestamp())
.ok_or_else(|| anyhow::anyhow!("本地时间歧义: {}", s)); .ok_or_else(|| anyhow::anyhow!("本地时间歧义: {}", s));
} }
} }
if let Ok(d) = chrono::NaiveDate::parse_from_str(s, "%Y-%m-%d") { if let Ok(d) = chrono::NaiveDate::parse_from_str(s, "%Y-%m-%d") {
let dt = d.and_hms_opt(0, 0, 0).unwrap(); let dt = d.and_hms_opt(0, 0, 0).unwrap();
return Local.from_local_datetime(&dt).single() return Local
.from_local_datetime(&dt)
.single()
.map(|d| d.timestamp()) .map(|d| d.timestamp())
.ok_or_else(|| anyhow::anyhow!("本地时间歧义: {}", s)); .ok_or_else(|| anyhow::anyhow!("本地时间歧义: {}", s));
} }
anyhow::bail!("无法解析时间 '{}',支持 YYYY-MM-DD / YYYY-MM-DD HH:MM / YYYY-MM-DD HH:MM:SS", s) anyhow::bail!(
"无法解析时间 '{}',支持 YYYY-MM-DD / YYYY-MM-DD HH:MM / YYYY-MM-DD HH:MM:SS",
s
)
} }
pub fn parse_time_end(s: &str) -> Result<i64> { pub fn parse_time_end(s: &str) -> Result<i64> {
@ -48,7 +62,9 @@ pub fn parse_time_end(s: &str) -> Result<i64> {
if s.len() == 10 { if s.len() == 10 {
if let Ok(d) = chrono::NaiveDate::parse_from_str(s, "%Y-%m-%d") { if let Ok(d) = chrono::NaiveDate::parse_from_str(s, "%Y-%m-%d") {
let dt = d.and_hms_opt(23, 59, 59).unwrap(); let dt = d.and_hms_opt(23, 59, 59).unwrap();
return Local.from_local_datetime(&dt).single() return Local
.from_local_datetime(&dt)
.single()
.map(|d| d.timestamp()) .map(|d| d.timestamp())
.ok_or_else(|| anyhow::anyhow!("本地时间歧义: {}", s)); .ok_or_else(|| anyhow::anyhow!("本地时间歧义: {}", s));
} }

View File

@ -35,8 +35,13 @@ pub fn cmd_init(force: bool) -> Result<()> {
// Step 1: 检测 db_dir // Step 1: 检测 db_dir
println!("检测微信数据目录..."); println!("检测微信数据目录...");
let db_dir = config::auto_detect_db_dir() let db_dir = config::auto_detect_db_dir().with_context(|| format!(
.context("未能自动检测到微信数据目录\n请手动编辑 config.json 中的 db_dir 字段")?; "未能自动检测到微信数据目录\n\
db_dir :\n \
{}\n\
db_dir : <data_root>\\xwechat_files\\<wxid>\\db_storage",
config_path.display()
))?;
println!("找到数据目录: {}", db_dir.display()); println!("找到数据目录: {}", db_dir.display());
// Step 2: 扫描密钥(需要 root/sudo // Step 2: 扫描密钥(需要 root/sudo
@ -97,6 +102,19 @@ pub fn cmd_init(force: bool) -> Result<()> {
println!("初始化完成,可以使用 wx sessions / wx history 等命令了"); println!("初始化完成,可以使用 wx sessions / wx history 等命令了");
#[cfg(target_os = "macos")]
{
eprintln!();
eprintln!("[macOS] 副作用提示:");
eprintln!(" 如果你是通过对 /Applications/WeChat.app 做 ad-hoc 重签来让 init 走通的,");
eprintln!(" 之后 macOS 可能弹 \"微信\" 想访问其他 App 的数据(在微信里打开公众号文章");
eprintln!(" 时尤其常见)。这是 ad-hoc 重签后 WeChat 的 code identity 变了导致的,");
eprintln!(" 不是 wx-cli 在读其他 App 数据。");
eprintln!(" 完整说明https://github.com/jackwener/wx-cli/blob/main/docs/macos-permission-guide.md#六微信-想访问其他-app-的数据-弹窗");
eprintln!(" (如果你的 WeChat 仍是 Apple 官方签名、init 是靠 GUI Terminal + 开发者工具");
eprintln!(" 授权走通的,则不会出现这个弹窗,可以忽略本提示。)");
}
Ok(()) Ok(())
} }

View File

@ -1,22 +1,25 @@
mod init; pub mod attachments;
pub mod biz_articles; pub mod biz_articles;
pub mod sessions;
pub mod history;
pub mod search;
pub mod contacts; pub mod contacts;
pub mod export;
pub mod daemon_cmd; pub mod daemon_cmd;
pub mod transport; pub mod export;
pub mod output; pub mod extract;
pub mod unread; pub mod favorites;
pub mod history;
mod init;
pub mod members; pub mod members;
pub mod new_messages; pub mod new_messages;
pub mod stats; pub mod output;
pub mod favorites; pub mod search;
pub mod sns_notifications; pub mod sessions;
pub mod sns_feed; pub mod sns_feed;
pub mod sns_notifications;
pub mod sns_search; pub mod sns_search;
pub mod stats;
pub mod transport;
pub mod unread;
use self::output::OutputOpts;
use anyhow::Result; use anyhow::Result;
use clap::{Parser, Subcommand}; use clap::{Parser, Subcommand};
@ -24,6 +27,12 @@ use clap::{Parser, Subcommand};
#[derive(Parser)] #[derive(Parser)]
#[command(name = "wx", version = env!("CARGO_PKG_VERSION"), about = "wx — 微信本地数据 CLI")] #[command(name = "wx", version = env!("CARGO_PKG_VERSION"), about = "wx — 微信本地数据 CLI")]
pub struct Cli { pub struct Cli {
/// 返回更重的 freshness/source 元数据(如 per-shard latest、cache modes
#[arg(long, global = true)]
with_meta: bool,
/// 在 meta 里暴露真实 shard 路径(调试用)
#[arg(long, global = true, hide = true)]
debug_source: bool,
#[command(subcommand)] #[command(subcommand)]
command: Commands, command: Commands,
} }
@ -262,6 +271,44 @@ enum Commands {
#[arg(long)] #[arg(long)]
json: bool, json: bool,
}, },
/// 列出某会话的图片附件,返回不透明 attachment_id
Attachments {
/// 会话名称(联系人显示名 / wxid / @chatroom username 都可以)
chat: String,
/// 类型(当前仅支持 image
#[arg(long = "kind", value_name = "KIND",
value_parser = ["image", "img"])]
kinds: Vec<String>,
/// 显示数量
#[arg(short = 'n', long, default_value = "50")]
limit: usize,
/// 分页偏移
#[arg(long, default_value = "0")]
offset: usize,
/// 起始时间 YYYY-MM-DD
#[arg(long)]
since: Option<String>,
/// 结束时间 YYYY-MM-DD
#[arg(long)]
until: Option<String>,
/// 输出 JSON默认 YAML
#[arg(long)]
json: bool,
},
/// 把单个 attachment_id 对应的资源解密写到指定文件路径
Extract {
/// 由 `wx attachments` 输出的不透明 IDbase64url 字符串)
attachment_id: String,
/// 输出文件路径(绝对或相对当前工作目录均可;扩展名建议保留为 .jpg 等)
#[arg(short = 'o', long)]
output: String,
/// 目标已存在时覆盖
#[arg(long)]
overwrite: bool,
/// 输出 JSON默认 YAML
#[arg(long)]
json: bool,
},
/// 管理 wx-daemon /// 管理 wx-daemon
Daemon { Daemon {
#[command(subcommand)] #[command(subcommand)]
@ -295,40 +342,184 @@ pub fn run() {
} }
fn dispatch(cli: Cli) -> Result<()> { fn dispatch(cli: Cli) -> Result<()> {
let base_with_meta = cli.with_meta;
let base_debug_source = cli.debug_source;
match cli.command { match cli.command {
Commands::Init { force } => init::cmd_init(force), Commands::Init { force } => init::cmd_init(force),
Commands::Sessions { limit, json } => sessions::cmd_sessions(limit, json), Commands::Sessions { limit, json } => sessions::cmd_sessions(
Commands::History { chat, limit, offset, since, until, msg_type, json } => { limit,
history::cmd_history(chat, limit, offset, since, until, msg_type, json) OutputOpts {
} json,
Commands::Search { keyword, chats, limit, since, until, msg_type, json } => { with_meta: base_with_meta,
search::cmd_search(keyword, chats, limit, since, until, msg_type, json) debug_source: base_debug_source,
} },
),
Commands::History {
chat,
limit,
offset,
since,
until,
msg_type,
json,
} => history::cmd_history(
chat,
limit,
offset,
since,
until,
msg_type,
OutputOpts {
json,
with_meta: base_with_meta,
debug_source: base_debug_source,
},
),
Commands::Search {
keyword,
chats,
limit,
since,
until,
msg_type,
json,
} => search::cmd_search(
keyword,
chats,
limit,
since,
until,
msg_type,
OutputOpts {
json,
with_meta: base_with_meta,
debug_source: base_debug_source,
},
),
Commands::Contacts { query, limit, json } => contacts::cmd_contacts(query, limit, json), Commands::Contacts { query, limit, json } => contacts::cmd_contacts(query, limit, json),
Commands::Export { chat, since, until, limit, format, output } => { Commands::Export {
export::cmd_export(chat, since, until, limit, format, output) chat,
since,
until,
limit,
format,
output,
} => {
let export_json = format == "json";
export::cmd_export(
chat,
since,
until,
limit,
format,
output,
OutputOpts {
json: export_json,
with_meta: base_with_meta,
debug_source: base_debug_source,
},
)
} }
Commands::Unread { limit, filter, json } => unread::cmd_unread(limit, filter, json), Commands::Unread {
limit,
filter,
json,
} => unread::cmd_unread(
limit,
filter,
OutputOpts {
json,
with_meta: base_with_meta,
debug_source: base_debug_source,
},
),
Commands::Members { chat, json } => members::cmd_members(chat, json), Commands::Members { chat, json } => members::cmd_members(chat, json),
Commands::NewMessages { limit, json } => new_messages::cmd_new_messages(limit, json), Commands::NewMessages { limit, json } => new_messages::cmd_new_messages(
Commands::Stats { chat, since, until, json } => { limit,
stats::cmd_stats(chat, since, until, json) OutputOpts {
} json,
Commands::Favorites { limit, fav_type, query, json } => { with_meta: base_with_meta,
favorites::cmd_favorites(limit, fav_type, query, json) debug_source: base_debug_source,
} },
Commands::SnsNotifications { limit, since, until, include_read, json } => { ),
sns_notifications::cmd_sns_notifications(limit, since, until, include_read, json) Commands::Stats {
} chat,
Commands::SnsFeed { limit, since, until, user, json } => { since,
sns_feed::cmd_sns_feed(limit, since, until, user, json) until,
} json,
Commands::SnsSearch { keyword, limit, since, until, user, json } => { } => stats::cmd_stats(
sns_search::cmd_sns_search(keyword, limit, since, until, user, json) chat,
} since,
Commands::BizArticles { limit, account, since, until, unread, json } => { until,
biz_articles::cmd_biz_articles(limit, account, since, until, unread, json) OutputOpts {
} json,
with_meta: base_with_meta,
debug_source: base_debug_source,
},
),
Commands::Favorites {
limit,
fav_type,
query,
json,
} => favorites::cmd_favorites(limit, fav_type, query, json),
Commands::SnsNotifications {
limit,
since,
until,
include_read,
json,
} => sns_notifications::cmd_sns_notifications(limit, since, until, include_read, json),
Commands::SnsFeed {
limit,
since,
until,
user,
json,
} => sns_feed::cmd_sns_feed(limit, since, until, user, json),
Commands::SnsSearch {
keyword,
limit,
since,
until,
user,
json,
} => sns_search::cmd_sns_search(keyword, limit, since, until, user, json),
Commands::BizArticles {
limit,
account,
since,
until,
unread,
json,
} => biz_articles::cmd_biz_articles(limit, account, since, until, unread, json),
Commands::Attachments {
chat,
kinds,
limit,
offset,
since,
until,
json,
} => attachments::cmd_attachments(
chat,
kinds,
limit,
offset,
since,
until,
OutputOpts {
json,
with_meta: base_with_meta,
debug_source: base_debug_source,
},
),
Commands::Extract {
attachment_id,
output,
overwrite,
json,
} => extract::cmd_extract(attachment_id, output, overwrite, json),
Commands::Daemon { cmd } => daemon_cmd::cmd_daemon(cmd), Commands::Daemon { cmd } => daemon_cmd::cmd_daemon(cmd),
} }
} }

View File

@ -1,8 +1,8 @@
use super::output::{emit_warnings, print_response, OutputOpts};
use super::transport;
use crate::ipc::Request;
use anyhow::Result; use anyhow::Result;
use std::collections::HashMap; use std::collections::HashMap;
use crate::ipc::Request;
use super::transport;
use super::output::{resolve, print_value};
fn state_file() -> std::path::PathBuf { fn state_file() -> std::path::PathBuf {
dirs::home_dir() dirs::home_dir()
@ -18,7 +18,8 @@ fn load_state() -> Option<HashMap<String, i64>> {
let data = std::fs::read_to_string(state_file()).ok()?; let data = std::fs::read_to_string(state_file()).ok()?;
let v: serde_json::Value = serde_json::from_str(&data).ok()?; let v: serde_json::Value = serde_json::from_str(&data).ok()?;
// 旧格式(只有 timestamp 字段)没有 sessions key → 返回 None 触发首次运行逻辑 // 旧格式(只有 timestamp 字段)没有 sessions key → 返回 None 触发首次运行逻辑
let map: HashMap<String, i64> = v.get("sessions")? let map: HashMap<String, i64> = v
.get("sessions")?
.as_object()? .as_object()?
.iter() .iter()
.filter_map(|(k, v)| v.as_i64().map(|ts| (k.clone(), ts))) .filter_map(|(k, v)| v.as_i64().map(|ts| (k.clone(), ts)))
@ -33,17 +34,27 @@ fn save_state(new_state: &HashMap<String, i64>) -> Result<()> {
if let Some(parent) = path.parent() { if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent)?; std::fs::create_dir_all(parent)?;
} }
std::fs::write(&path, serde_json::to_string(&serde_json::json!({ "sessions": new_state }))?)?; std::fs::write(
&path,
serde_json::to_string(&serde_json::json!({ "sessions": new_state }))?,
)?;
Ok(()) Ok(())
} }
pub fn cmd_new_messages(limit: usize, json: bool) -> Result<()> { pub fn cmd_new_messages(limit: usize, opts: OutputOpts) -> Result<()> {
let state = load_state(); let state = load_state();
let resp = transport::send(Request::NewMessages { state, limit })?; let (with_meta, debug_source) = opts.request_flags();
let resp = transport::send(Request::NewMessages {
state,
limit,
with_meta,
debug_source,
})?;
// 保存 daemon 返回的 new_state // 保存 daemon 返回的 new_state
if let Some(obj) = resp.data.get("new_state").and_then(|v| v.as_object()) { if let Some(obj) = resp.data.get("new_state").and_then(|v| v.as_object()) {
let map: HashMap<String, i64> = obj.iter() let map: HashMap<String, i64> = obj
.iter()
.filter_map(|(k, v)| v.as_i64().map(|ts| (k.clone(), ts))) .filter_map(|(k, v)| v.as_i64().map(|ts| (k.clone(), ts)))
.collect(); .collect();
if !map.is_empty() { if !map.is_empty() {
@ -51,8 +62,6 @@ pub fn cmd_new_messages(limit: usize, json: bool) -> Result<()> {
} }
} }
let messages = resp.data.get("messages") emit_warnings(&resp.data);
.cloned() print_response(&resp.data, &opts)
.unwrap_or(serde_json::Value::Array(vec![]));
print_value(&messages, &resolve(json))
} }

View File

@ -1,12 +1,31 @@
use chrono::{Local, TimeZone};
/// 输出格式 /// 输出格式
pub enum Fmt { pub enum Fmt {
Yaml, Yaml,
Json, Json,
} }
#[derive(Clone, Copy, Debug)]
pub struct OutputOpts {
pub json: bool,
pub with_meta: bool,
pub debug_source: bool,
}
impl OutputOpts {
pub fn request_flags(self) -> (bool, bool) {
(self.with_meta || self.debug_source, self.debug_source)
}
}
/// 默认 YAML--json 时输出 JSON /// 默认 YAML--json 时输出 JSON
pub fn resolve(json: bool) -> Fmt { pub fn resolve(json: bool) -> Fmt {
if json { Fmt::Json } else { Fmt::Yaml } if json {
Fmt::Json
} else {
Fmt::Yaml
}
} }
pub fn print_value(value: &serde_json::Value, fmt: &Fmt) -> anyhow::Result<()> { pub fn print_value(value: &serde_json::Value, fmt: &Fmt) -> anyhow::Result<()> {
@ -16,3 +35,95 @@ pub fn print_value(value: &serde_json::Value, fmt: &Fmt) -> anyhow::Result<()> {
} }
Ok(()) Ok(())
} }
pub fn print_response(data: &serde_json::Value, opts: &OutputOpts) -> anyhow::Result<()> {
print_value(data, &resolve(opts.json))
}
pub fn emit_warnings(data: &serde_json::Value) {
for line in warning_lines(data) {
eprintln!("[wx] 警告:{}", line);
}
}
pub fn warning_lines(data: &serde_json::Value) -> Vec<String> {
let mut lines = Vec::new();
let meta = match data.get("meta") {
Some(v) if v.is_object() => v,
_ => return lines,
};
let unknown_shards: Vec<String> = meta
.get("unknown_shards")
.and_then(|v| v.as_array())
.map(|arr| {
arr.iter()
.filter_map(|v| v.as_str().map(|s| s.to_string()))
.collect()
})
.unwrap_or_default();
if !unknown_shards.is_empty() {
lines.push(format!(
"磁盘上发现 daemon 不认识的分片 {},结果可能不完整;运行 `wx init --force` 重新提取密钥。",
unknown_shards.join(", ")
));
}
let status = meta.get("status").and_then(|v| v.as_str()).unwrap_or("");
if status == "possibly_stale" || status == "possibly_stale_unknown_shards" {
let session_ts = meta.get("session_last_timestamp").and_then(|v| v.as_i64());
let chat_ts = meta.get("chat_latest_timestamp").and_then(|v| v.as_i64());
if let (Some(session_ts), Some(chat_ts)) = (session_ts, chat_ts) {
let subject = data
.get("chat")
.and_then(|v| v.as_str())
.or_else(|| data.get("username").and_then(|v| v.as_str()))
.unwrap_or("当前查询");
lines.push(format!(
"session.db 显示 '{}' 最新到 {},但本次扫描只到 {},结果可能过期或不完整。",
subject,
fmt_meta_ts(session_ts),
fmt_meta_ts(chat_ts),
));
}
}
lines
}
pub fn warning_block_text(data: &serde_json::Value) -> Option<String> {
let lines = warning_lines(data);
if lines.is_empty() {
return None;
}
Some(
lines
.into_iter()
.map(|line| format!("[wx] 警告:{}", line))
.collect::<Vec<_>>()
.join("\n"),
)
}
pub fn warning_block_markdown(data: &serde_json::Value) -> Option<String> {
let lines = warning_lines(data);
if lines.is_empty() {
return None;
}
let mut out = String::from("> [!WARNING]\n");
for line in lines {
out.push_str("> ");
out.push_str(&line);
out.push('\n');
}
Some(out)
}
fn fmt_meta_ts(ts: i64) -> String {
Local
.timestamp_opt(ts, 0)
.single()
.map(|dt| dt.format("%Y-%m-%d %H:%M:%S").to_string())
.unwrap_or_else(|| ts.to_string())
}

View File

@ -1,8 +1,8 @@
use anyhow::Result; use super::history::{parse_msg_type, parse_time, parse_time_end};
use crate::ipc::Request; use super::output::{emit_warnings, print_response, OutputOpts};
use super::transport; use super::transport;
use super::history::{parse_time, parse_time_end, parse_msg_type}; use crate::ipc::Request;
use super::output::{resolve, print_value}; use anyhow::Result;
pub fn cmd_search( pub fn cmd_search(
keyword: String, keyword: String,
@ -11,12 +11,13 @@ pub fn cmd_search(
since: Option<String>, since: Option<String>,
until: Option<String>, until: Option<String>,
msg_type: Option<String>, msg_type: Option<String>,
json: bool, opts: OutputOpts,
) -> Result<()> { ) -> Result<()> {
let since_ts = since.as_deref().map(parse_time).transpose()?; let since_ts = since.as_deref().map(parse_time).transpose()?;
let until_ts = until.as_deref().map(parse_time_end).transpose()?; let until_ts = until.as_deref().map(parse_time_end).transpose()?;
let type_val = msg_type.as_deref().and_then(parse_msg_type); let type_val = msg_type.as_deref().and_then(parse_msg_type);
let chats_opt = if chats.is_empty() { None } else { Some(chats) }; let chats_opt = if chats.is_empty() { None } else { Some(chats) };
let (with_meta, debug_source) = opts.request_flags();
let req = Request::Search { let req = Request::Search {
keyword, keyword,
@ -25,11 +26,11 @@ pub fn cmd_search(
since: since_ts, since: since_ts,
until: until_ts, until: until_ts,
msg_type: type_val, msg_type: type_val,
with_meta,
debug_source,
}; };
let resp = transport::send(req)?; let resp = transport::send(req)?;
let results = resp.data.get("results") emit_warnings(&resp.data);
.cloned() print_response(&resp.data, &opts)
.unwrap_or(serde_json::Value::Array(vec![]));
print_value(&results, &resolve(json))
} }

View File

@ -1,12 +1,15 @@
use anyhow::Result; use super::output::{emit_warnings, print_response, OutputOpts};
use crate::ipc::Request;
use super::transport; use super::transport;
use super::output::{resolve, print_value}; use crate::ipc::Request;
use anyhow::Result;
pub fn cmd_sessions(limit: usize, json: bool) -> Result<()> { pub fn cmd_sessions(limit: usize, opts: OutputOpts) -> Result<()> {
let resp = transport::send(Request::Sessions { limit })?; let (with_meta, debug_source) = opts.request_flags();
let data = resp.data.get("sessions") let resp = transport::send(Request::Sessions {
.cloned() limit,
.unwrap_or(serde_json::Value::Array(vec![])); with_meta,
print_value(&data, &resolve(json)) debug_source,
})?;
emit_warnings(&resp.data);
print_response(&resp.data, &opts)
} }

View File

@ -1,18 +1,25 @@
use anyhow::Result;
use crate::ipc::Request;
use super::transport;
use super::history::{parse_time, parse_time_end}; use super::history::{parse_time, parse_time_end};
use super::output::{resolve, print_value}; use super::output::{emit_warnings, print_response, OutputOpts};
use super::transport;
use crate::ipc::Request;
use anyhow::Result;
pub fn cmd_stats( pub fn cmd_stats(
chat: String, chat: String,
since: Option<String>, since: Option<String>,
until: Option<String>, until: Option<String>,
json: bool, opts: OutputOpts,
) -> Result<()> { ) -> Result<()> {
let since_ts = since.as_deref().map(parse_time).transpose()?; let since_ts = since.as_deref().map(parse_time).transpose()?;
let until_ts = until.as_deref().map(parse_time_end).transpose()?; let until_ts = until.as_deref().map(parse_time_end).transpose()?;
let (with_meta, debug_source) = opts.request_flags();
let resp = transport::send(Request::Stats { chat, since: since_ts, until: until_ts })?; let resp = transport::send(Request::Stats {
print_value(&resp.data, &resolve(json)) chat,
since: since_ts,
until: until_ts,
with_meta,
debug_source,
})?;
emit_warnings(&resp.data);
print_response(&resp.data, &opts)
} }

View File

@ -1,18 +1,22 @@
use anyhow::Result; use super::output::{emit_warnings, print_response, OutputOpts};
use crate::ipc::Request;
use super::transport; use super::transport;
use super::output::{resolve, print_value}; use crate::ipc::Request;
use anyhow::Result;
pub fn cmd_unread(limit: usize, filter: Vec<String>, json: bool) -> Result<()> { pub fn cmd_unread(limit: usize, filter: Vec<String>, opts: OutputOpts) -> Result<()> {
// 空或含 "all" 视为不过滤;其他值已被 clap value_parser 验证过,直接透传给 daemon。 // 空或含 "all" 视为不过滤;其他值已被 clap value_parser 验证过,直接透传给 daemon。
let filter_vec = if filter.is_empty() || filter.iter().any(|s| s == "all") { let filter_vec = if filter.is_empty() || filter.iter().any(|s| s == "all") {
None None
} else { } else {
Some(filter) Some(filter)
}; };
let resp = transport::send(Request::Unread { limit, filter: filter_vec })?; let (with_meta, debug_source) = opts.request_flags();
let data = resp.data.get("sessions") let resp = transport::send(Request::Unread {
.cloned() limit,
.unwrap_or(serde_json::Value::Array(vec![])); filter: filter_vec,
print_value(&data, &resolve(json)) with_meta,
debug_source,
})?;
emit_warnings(&resp.data);
print_response(&resp.data, &opts)
} }

View File

@ -320,9 +320,11 @@ fn detect_db_dir_impl() -> Option<PathBuf> {
let path = entry.path(); let path = entry.path();
if path.extension().map(|e| e == "ini").unwrap_or(false) { if path.extension().map(|e| e == "ini").unwrap_or(false) {
if let Ok(content) = std::fs::read_to_string(&path) { if let Ok(content) = std::fs::read_to_string(&path) {
let data_root = content.trim().to_string(); let Some(data_root) = resolve_windows_data_root(content.trim()) else {
if PathBuf::from(&data_root).is_dir() { continue;
let pattern = PathBuf::from(&data_root).join("xwechat_files"); };
if data_root.is_dir() {
let pattern = data_root.join("xwechat_files");
if let Ok(entries2) = std::fs::read_dir(&pattern) { if let Ok(entries2) = std::fs::read_dir(&pattern) {
for entry2 in entries2.flatten() { for entry2 in entries2.flatten() {
let storage = entry2.path().join("db_storage"); let storage = entry2.path().join("db_storage");
@ -340,6 +342,72 @@ fn detect_db_dir_impl() -> Option<PathBuf> {
candidates.into_iter().next_back() candidates.into_iter().next_back()
} }
/// Resolve the data-root path that Weixin writes to its `*.ini` file under
/// `%APPDATA%\Tencent\xwechat\config\`.
///
/// Observed forms in the wild:
/// - A plain absolute path, e.g. `D:\WeChatFiles`.
/// - The literal token `MyDocument:` (sometimes with a trailing slash),
/// which is not a real filesystem path. Empirically this denotes
/// "the current user's Documents folder"; users who relocated
/// Documents to e.g. `D:\Documents` saw auto-detect fail silently
/// because `PathBuf::from("MyDocument:").is_dir()` is false.
///
/// We accept either form. For the `MyDocument:` token we resolve via
/// `SHGetKnownFolderPath(FOLDERID_Documents)`, which respects the standard
/// shell-folder redirect at
/// `HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\User Shell Folders\Personal`.
#[cfg(target_os = "windows")]
fn resolve_windows_data_root(content: &str) -> Option<PathBuf> {
let trimmed = content.trim();
// Strip an optional trailing slash so `MyDocument:\` and `MyDocument:/` also match.
let stripped = trimmed
.strip_suffix(['\\', '/'])
.unwrap_or(trimmed);
if stripped.eq_ignore_ascii_case("MyDocument:") {
return known_documents_dir();
}
Some(PathBuf::from(trimmed))
}
#[cfg(target_os = "windows")]
fn known_documents_dir() -> Option<PathBuf> {
use std::ffi::OsString;
use std::os::windows::ffi::OsStringExt;
use windows::Win32::Foundation::HANDLE;
use windows::Win32::System::Com::CoTaskMemFree;
use windows::Win32::UI::Shell::{
FOLDERID_Documents, SHGetKnownFolderPath, KF_FLAG_DEFAULT,
};
// SAFETY: standard Win32 known-folder API. SHGetKnownFolderPath either returns
// a heap-allocated PWSTR that the caller must free with CoTaskMemFree, or an
// error — in which case the out-pointer is not allocated. We free on every
// success path. Passing a null token (HANDLE::default()) means "the calling
// user", which is exactly what we want.
unsafe {
let pwstr =
SHGetKnownFolderPath(&FOLDERID_Documents, KF_FLAG_DEFAULT, HANDLE::default()).ok()?;
if pwstr.0.is_null() {
return None;
}
// Walk the NUL-terminated wide string to compute its length.
let mut len = 0usize;
while *pwstr.0.add(len) != 0 {
len += 1;
}
let slice = std::slice::from_raw_parts(pwstr.0, len);
let os_str = OsString::from_wide(slice);
CoTaskMemFree(Some(pwstr.0 as *const _));
let path = PathBuf::from(os_str);
if path.as_os_str().is_empty() {
None
} else {
Some(path)
}
}
}
#[cfg(not(any(target_os = "macos", target_os = "linux", target_os = "windows")))] #[cfg(not(any(target_os = "macos", target_os = "linux", target_os = "windows")))]
fn detect_db_dir_impl() -> Option<PathBuf> { fn detect_db_dir_impl() -> Option<PathBuf> {
None None
@ -351,6 +419,8 @@ mod tests {
config_path_in_dir, default_config_path, find_existing_config_path, home_config_path, config_path_in_dir, default_config_path, find_existing_config_path, home_config_path,
resolve_cli_home, resolve_cli_home,
}; };
#[cfg(target_os = "windows")]
use super::{known_documents_dir, resolve_windows_data_root};
use std::fs; use std::fs;
use std::path::PathBuf; use std::path::PathBuf;
use std::time::{SystemTime, UNIX_EPOCH}; use std::time::{SystemTime, UNIX_EPOCH};
@ -409,4 +479,24 @@ mod tests {
let path = default_config_path(Some(&cwd), Some(&exe), Some(&home)); let path = default_config_path(Some(&cwd), Some(&exe), Some(&home));
assert_eq!(path, cwd.join("config.json")); assert_eq!(path, cwd.join("config.json"));
} }
#[cfg(target_os = "windows")]
#[test]
fn resolve_windows_data_root_passes_through_absolute_path() {
let p = resolve_windows_data_root("D:\\WeChatFiles").unwrap();
assert_eq!(p, PathBuf::from("D:\\WeChatFiles"));
}
#[cfg(target_os = "windows")]
#[test]
fn resolve_windows_data_root_recognises_mydocument_keyword() {
// Should match the keyword exactly (case-insensitive, with or without trailing slash)
// and resolve to a non-empty Documents path via SHGetKnownFolderPath.
let docs = known_documents_dir().expect("Documents known folder must resolve");
for keyword in ["MyDocument:", "mydocument:", "MyDocument:\\", "MyDocument:/"] {
let resolved = resolve_windows_data_root(keyword)
.unwrap_or_else(|| panic!("keyword {keyword:?} should resolve"));
assert_eq!(resolved, docs, "keyword {keyword:?}");
}
}
} }

View File

@ -23,6 +23,40 @@ struct CacheEntry {
decrypted_path: PathBuf, decrypted_path: PathBuf,
} }
/// `DbCache::get_with_mode()` 本次解析 rel_key 时实际走了哪条路径。
///
/// latency tier:
/// - `CacheHit`~0ms只返回已有解密产物
/// - `WalIncremental`:典型 <10s只在 cached DB 上增量 apply WAL
/// - `FullDecrypt`:最慢路径,大库上可能到 ~120s
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum CacheMode {
/// Path 1主 `.db` 和 WAL 都没变,直接命中缓存。
CacheHit,
/// Path 2主 `.db` 没变、只有 WAL 变了,在 cached DB 上增量 apply。
WalIncremental,
/// Path 3主 `.db` 变了或缓存 miss重新 full decrypt。
FullDecrypt,
}
impl CacheMode {
/// 手工固定为 snake_case 字符串,避免未来给 enum 直接 derive `Serialize`
/// 时静默改变 wire 形态。
pub fn as_str(self) -> &'static str {
match self {
CacheMode::CacheHit => "cache_hit",
CacheMode::WalIncremental => "wal_incremental",
CacheMode::FullDecrypt => "full_decrypt",
}
}
}
#[derive(Debug, Clone)]
pub struct CacheResolve {
pub path: PathBuf,
pub mode: CacheMode,
}
/// 解密后数据库的 mtime-aware 缓存 /// 解密后数据库的 mtime-aware 缓存
/// ///
/// 当数据库文件(.db或 WAL 文件(.db-wal的 mtime 发生变化时, /// 当数据库文件(.db或 WAL 文件(.db-wal的 mtime 发生变化时,
@ -30,30 +64,43 @@ struct CacheEntry {
pub struct DbCache { pub struct DbCache {
db_dir: PathBuf, db_dir: PathBuf,
cache_dir: PathBuf, cache_dir: PathBuf,
mtime_file: PathBuf,
all_keys: HashMap<String, String>, // rel_key -> enc_key(hex) all_keys: HashMap<String, String>, // rel_key -> enc_key(hex)
inner: Arc<Mutex<HashMap<String, CacheEntry>>>, inner: Arc<Mutex<HashMap<String, CacheEntry>>>,
} }
impl DbCache { impl DbCache {
pub async fn new( pub async fn new(db_dir: PathBuf, all_keys: HashMap<String, String>) -> Result<Self> {
Self::with_dirs(db_dir, config::cache_dir(), config::mtime_file(), all_keys).await
}
/// 注入 `cache_dir` / `mtime_file`(测试用 + 生产 `new()` 复用)
pub(crate) async fn with_dirs(
db_dir: PathBuf, db_dir: PathBuf,
cache_dir: PathBuf,
mtime_file: PathBuf,
all_keys: HashMap<String, String>, all_keys: HashMap<String, String>,
) -> Result<Self> { ) -> Result<Self> {
let cache_dir = config::cache_dir();
tokio::fs::create_dir_all(&cache_dir).await?; tokio::fs::create_dir_all(&cache_dir).await?;
let inner: HashMap<String, CacheEntry> = HashMap::new();
let cache = DbCache { let cache = DbCache {
db_dir, db_dir,
cache_dir, cache_dir,
mtime_file,
all_keys, all_keys,
inner: Arc::new(Mutex::new(inner)), inner: Arc::new(Mutex::new(HashMap::new())),
}; };
cache.load_persistent().await; cache.load_persistent().await;
Ok(cache) Ok(cache)
} }
/// 数据库根目录(即 `<wxchat_base>/db_storage`)。
/// 上层attachment resolver需要 `db_dir.parent()` 来定位 `msg/attach/...` 解密图片。
pub fn db_dir(&self) -> &Path {
&self.db_dir
}
fn cache_file_path(&self, rel_key: &str) -> PathBuf { fn cache_file_path(&self, rel_key: &str) -> PathBuf {
let hash = format!("{:x}", md5::compute(rel_key.as_bytes())); let hash = format!("{:x}", md5::compute(rel_key.as_bytes()));
self.cache_dir.join(format!("{}.db", hash)) self.cache_dir.join(format!("{}.db", hash))
@ -61,7 +108,7 @@ impl DbCache {
/// 从持久化文件加载 mtime 记录,复用未过期的解密文件 /// 从持久化文件加载 mtime 记录,复用未过期的解密文件
async fn load_persistent(&self) { async fn load_persistent(&self) {
let mtime_file = config::mtime_file(); let mtime_file = &self.mtime_file;
let content = match tokio::fs::read_to_string(&mtime_file).await { let content = match tokio::fs::read_to_string(&mtime_file).await {
Ok(c) => c, Ok(c) => c,
Err(_) => return, Err(_) => return,
@ -78,18 +125,34 @@ impl DbCache {
if !dec_path.exists() { if !dec_path.exists() {
continue; continue;
} }
let db_path = self.db_dir.join(rel_key.replace('\\', std::path::MAIN_SEPARATOR_STR).replace('/', std::path::MAIN_SEPARATOR_STR)); let db_path = self.db_dir.join(
rel_key
.replace('\\', std::path::MAIN_SEPARATOR_STR)
.replace('/', std::path::MAIN_SEPARATOR_STR),
);
let wal_path = wal_path_for(&db_path); let wal_path = wal_path_for(&db_path);
let db_mt = mtime_nanos(&db_path); let db_mt = mtime_nanos(&db_path);
let wal_mt = if wal_path.exists() { mtime_nanos(&wal_path) } else { 0 }; let _wal_mt = if wal_path.exists() {
mtime_nanos(&wal_path)
} else {
0
};
if db_mt == entry.db_mt && wal_mt == entry.wal_mt { // 只要主 .db 没变,就把 cached 产物载回来。
inner.insert(rel_key.clone(), CacheEntry { // 如果 WAL mtime 变了,后续 `get()` 会自动走 Path 2在已有 cached DB 上增量 apply_wal
// 而不是 daemon 重启后第一条请求又退回全量解密。
if db_mt == entry.db_mt {
inner.insert(
rel_key.clone(),
CacheEntry {
db_mtime: db_mt, db_mtime: db_mt,
wal_mtime: wal_mt, // 保留"cached 产物构建时看到的 wal_mtime",让 `get()` 去比较当前 WAL
// 是否发生了变化,从而决定 exact-hit 还是 WAL 增量。
wal_mtime: entry.wal_mt,
decrypted_path: dec_path, decrypted_path: dec_path,
}); },
);
reused += 1; reused += 1;
} }
} }
@ -100,15 +163,21 @@ impl DbCache {
/// 持久化 mtime 记录 /// 持久化 mtime 记录
async fn save_persistent(&self) { async fn save_persistent(&self) {
let mtime_file = config::mtime_file(); let mtime_file = &self.mtime_file;
let inner = self.inner.lock().await; let inner = self.inner.lock().await;
let data: HashMap<String, MtimeEntry> = inner.iter().map(|(k, v)| { let data: HashMap<String, MtimeEntry> = inner
(k.clone(), MtimeEntry { .iter()
.map(|(k, v)| {
(
k.clone(),
MtimeEntry {
db_mt: v.db_mtime, db_mt: v.db_mtime,
wal_mt: v.wal_mtime, wal_mt: v.wal_mtime,
path: v.decrypted_path.to_string_lossy().into_owned(), path: v.decrypted_path.to_string_lossy().into_owned(),
},
)
}) })
}).collect(); .collect();
drop(inner); drop(inner);
if let Ok(json) = serde_json::to_string_pretty(&data) { if let Ok(json) = serde_json::to_string_pretty(&data) {
@ -118,84 +187,148 @@ impl DbCache {
/// 获取解密后的数据库路径 /// 获取解密后的数据库路径
/// ///
/// 如果 mtime 未变,直接返回缓存路径;否则重新解密 /// 三种命中路径:
/// 1. 主 `.db` 和 WAL mtime 都未变 → 直接返回缓存路径
/// 2. 主 `.db` 未变、WAL mtime 变了 → 在已有 cached 产物上**增量** `apply_wal`
/// apply_wal 是幂等的:旧帧 redo 同样的 page 写入,新帧追加生效;不重新 full_decrypt
/// 3. 主 `.db` mtime 变了 → 重新 `full_decrypt` + `apply_wal`
///
/// WeChat 在写消息时只 append WAL除非触发 checkpoint因此 path 2 是常态;
/// 这条路径把"每次请求都全量解密 ~1.8GB DB~120s"压到"只解 WAL 帧(典型 < 10s"。
pub async fn get(&self, rel_key: &str) -> Result<Option<PathBuf>> { pub async fn get(&self, rel_key: &str) -> Result<Option<PathBuf>> {
Ok(self.get_with_mode(rel_key).await?.map(|r| r.path))
}
pub async fn get_with_mode(&self, rel_key: &str) -> Result<Option<CacheResolve>> {
let enc_key_hex = match self.all_keys.get(rel_key) { let enc_key_hex = match self.all_keys.get(rel_key) {
Some(k) => k.clone(), Some(k) => k.clone(),
None => return Ok(None), None => return Ok(None),
}; };
let db_path = self.db_dir.join( let db_path = self.db_dir.join(
rel_key.replace('\\', std::path::MAIN_SEPARATOR_STR) rel_key
.replace('/', std::path::MAIN_SEPARATOR_STR) .replace('\\', std::path::MAIN_SEPARATOR_STR)
.replace('/', std::path::MAIN_SEPARATOR_STR),
); );
if !db_path.exists() { if !db_path.exists() {
return Ok(None); return Ok(None);
} }
let wal_path = wal_path_for(&db_path); let wal_path = wal_path_for(&db_path);
let db_mt = mtime_nanos(&db_path); let db_mt = mtime_nanos(&db_path);
let wal_mt = if wal_path.exists() { mtime_nanos(&wal_path) } else { 0 }; let wal_mt = if wal_path.exists() {
mtime_nanos(&wal_path)
} else {
0
};
// 检查缓存 let cached = {
{
let inner = self.inner.lock().await; let inner = self.inner.lock().await;
if let Some(entry) = inner.get(rel_key) { inner.get(rel_key).cloned()
if entry.db_mtime == db_mt };
&& entry.wal_mtime == wal_mt
&& entry.decrypted_path.exists() let enc_key_bytes =
hex_to_32bytes(&enc_key_hex).with_context(|| format!("密钥格式错误: {}", rel_key))?;
// Path 1 / Path 2主 .db mtime 未变且 cached 产物仍在
if let Some(entry) = cached.as_ref() {
if entry.db_mtime == db_mt && entry.decrypted_path.exists() {
if entry.wal_mtime == wal_mt {
return Ok(Some(CacheResolve {
path: entry.decrypted_path.clone(),
mode: CacheMode::CacheHit,
}));
}
// Path 2: WAL-only 变化 → 在 cached 产物上重新 apply_wal
// 不存在的 WAL 也要更新 wal_mtime=0虽然 SQLite 不会自发"主库不变 + WAL 清空"
let out_path = entry.decrypted_path.clone();
let t0 = std::time::Instant::now();
if wal_path.exists() {
let out_path2 = out_path.clone();
let wal_path2 = wal_path.clone();
let key_copy = enc_key_bytes;
tokio::task::spawn_blocking(move || {
wal::apply_wal(&wal_path2, &out_path2, &key_copy)
})
.await??;
}
eprintln!(
"[cache] WAL 增量 {} ({}ms)",
rel_key,
t0.elapsed().as_millis()
);
{ {
return Ok(Some(entry.decrypted_path.clone())); let mut inner = self.inner.lock().await;
inner.insert(
rel_key.to_string(),
CacheEntry {
db_mtime: db_mt,
wal_mtime: wal_mt,
decrypted_path: out_path.clone(),
},
);
} }
self.save_persistent().await;
return Ok(Some(CacheResolve {
path: out_path,
mode: CacheMode::WalIncremental,
}));
} }
} }
// 需要重新解密 // Path 3: 主 .db 变了 / 缓存 miss → 全量解密
let out_path = self.cache_file_path(rel_key); let out_path = self.cache_file_path(rel_key);
let enc_key_bytes = hex_to_32bytes(&enc_key_hex)
.with_context(|| format!("密钥格式错误: {}", rel_key))?;
let t0 = std::time::Instant::now(); let t0 = std::time::Instant::now();
let db_path2 = db_path.clone(); let db_path2 = db_path.clone();
let out_path2 = out_path.clone(); let out_path2 = out_path.clone();
let key_copy = enc_key_bytes; let key_copy = enc_key_bytes;
tokio::task::spawn_blocking(move || { tokio::task::spawn_blocking(move || crypto::full_decrypt(&db_path2, &out_path2, &key_copy))
crypto::full_decrypt(&db_path2, &out_path2, &key_copy) .await??;
}).await??;
// 应用 WAL
if wal_path.exists() { if wal_path.exists() {
let out_path3 = out_path.clone(); let out_path3 = out_path.clone();
let wal_path3 = wal_path.clone(); let wal_path3 = wal_path.clone();
let key_copy2 = enc_key_bytes; let key_copy2 = enc_key_bytes;
tokio::task::spawn_blocking(move || { tokio::task::spawn_blocking(move || wal::apply_wal(&wal_path3, &out_path3, &key_copy2))
wal::apply_wal(&wal_path3, &out_path3, &key_copy2) .await??;
}).await??;
} }
let elapsed_ms = t0.elapsed().as_millis(); eprintln!(
eprintln!("[cache] 解密 {} ({}ms)", rel_key, elapsed_ms); "[cache] 全量解密 {} ({}ms)",
rel_key,
t0.elapsed().as_millis()
);
// 更新内存缓存
{ {
let mut inner = self.inner.lock().await; let mut inner = self.inner.lock().await;
inner.insert(rel_key.to_string(), CacheEntry { inner.insert(
rel_key.to_string(),
CacheEntry {
db_mtime: db_mt, db_mtime: db_mt,
wal_mtime: wal_mt, wal_mtime: wal_mt,
decrypted_path: out_path.clone(), decrypted_path: out_path.clone(),
}); },
);
} }
self.save_persistent().await; self.save_persistent().await;
Ok(Some(out_path)) Ok(Some(CacheResolve {
path: out_path,
mode: CacheMode::FullDecrypt,
}))
} }
} }
pub(super) fn mtime_nanos(path: &Path) -> u64 { pub(super) fn mtime_nanos(path: &Path) -> u64 {
std::fs::metadata(path) std::fs::metadata(path)
.and_then(|m| m.modified()) .and_then(|m| m.modified())
.map(|t| t.duration_since(std::time::UNIX_EPOCH).unwrap_or_default().as_nanos() as u64) .map(|t| {
t.duration_since(std::time::UNIX_EPOCH)
.unwrap_or_default()
.as_nanos() as u64
})
.unwrap_or(0) .unwrap_or(0)
} }
@ -217,3 +350,307 @@ fn hex_to_32bytes(s: &str) -> Result<[u8; 32]> {
} }
Ok(out) Ok(out)
} }
#[cfg(test)]
mod tests {
use super::*;
/// 64 字符 hex不需要是真 SQLCipher key — 仅用来证明"是否触发了 full_decrypt"
const FAKE_KEY_HEX: &str = "0000000000000000000000000000000000000000000000000000000000000000";
/// 路径区分约定:
/// - 完全 hit / WAL 增量 → `decrypted_path` **内容不变**
/// - 全量解密 → `crypto::full_decrypt` 把 cached file **重写为 PAGE_SZ 倍数**
/// fake key 解出 4096 字节垃圾,但仍写入 — 不验证内容合法性)
/// 因此用 cached file 的"size 是否被改"来判断走了哪条路径。
const ORIGINAL_CACHED_BYTES: &[u8] = b"original cached contents";
fn unique_tmpdir(tag: &str) -> PathBuf {
let pid = std::process::id();
let nanos = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_nanos();
let p = std::env::temp_dir().join(format!("wx-cli-cache-test-{}-{}-{}", tag, pid, nanos));
std::fs::create_dir_all(&p).unwrap();
p
}
/// 准备一份 "DbCache 已经 reuse 了 cached 解密产物" 的初始状态。
/// 返回 (cache, db_path, decrypted_path, mtime_file, rel_key)。
async fn setup_seeded_cache(tag: &str) -> (DbCache, PathBuf, PathBuf, PathBuf, String) {
let root = unique_tmpdir(tag);
let db_dir = root.join("db_storage");
let cache_dir = root.join("cache");
std::fs::create_dir_all(&db_dir).unwrap();
std::fs::create_dir_all(&cache_dir).unwrap();
let rel_key = "message_0.db".to_string();
let db_path = db_dir.join(&rel_key);
std::fs::write(&db_path, b"fake encrypted db").unwrap();
let cached_hash = format!("{:x}", md5::compute(rel_key.as_bytes()));
let decrypted_path = cache_dir.join(format!("{}.db", cached_hash));
std::fs::write(&decrypted_path, ORIGINAL_CACHED_BYTES).unwrap();
let db_mt = mtime_nanos(&db_path);
let mtime_file = cache_dir.join("_mtimes.json");
let payload = serde_json::to_string(&serde_json::json!({
&rel_key: {
"db_mt": db_mt,
"wal_mt": 0u64,
"path": decrypted_path.display().to_string(),
}
}))
.unwrap();
std::fs::write(&mtime_file, payload).unwrap();
let mut all_keys = HashMap::new();
all_keys.insert(rel_key.clone(), FAKE_KEY_HEX.to_string());
let cache = DbCache::with_dirs(db_dir, cache_dir, mtime_file.clone(), all_keys)
.await
.unwrap();
(cache, db_path, decrypted_path, mtime_file, rel_key)
}
#[tokio::test]
async fn exact_mtime_hit_skips_decrypt() {
let (cache, _db_path, decrypted_path, _mtime_file, rel_key) =
setup_seeded_cache("exact").await;
let p = cache
.get(&rel_key)
.await
.unwrap()
.expect("cache should hit");
assert_eq!(p, decrypted_path);
// 完全 hit → cached file 内容不应被改
let body = std::fs::read(&decrypted_path).unwrap();
assert_eq!(body, ORIGINAL_CACHED_BYTES);
}
#[tokio::test]
async fn wal_only_change_uses_incremental_path() {
// 自己构造(不走 setup_seeded_cache以便初始 mtime.json 同时写 db_mt 和 wal_mt
let root = unique_tmpdir("walonly");
let db_dir = root.join("db_storage");
let cache_dir = root.join("cache");
std::fs::create_dir_all(&db_dir).unwrap();
std::fs::create_dir_all(&cache_dir).unwrap();
let rel_key = "message_0.db".to_string();
let db_path = db_dir.join(&rel_key);
std::fs::write(&db_path, b"fake encrypted db").unwrap();
let wal_path = wal_path_for(&db_path);
std::fs::write(&wal_path, [0u8; 31]).unwrap(); // ≤ WAL_HDR_SZ=32 → apply_wal noop
let cached_hash = format!("{:x}", md5::compute(rel_key.as_bytes()));
let decrypted_path = cache_dir.join(format!("{}.db", cached_hash));
std::fs::write(&decrypted_path, ORIGINAL_CACHED_BYTES).unwrap();
let db_mt = mtime_nanos(&db_path);
let wal_mt0 = mtime_nanos(&wal_path);
let mtime_file = cache_dir.join("_mtimes.json");
let payload = serde_json::to_string(&serde_json::json!({
&rel_key: {
"db_mt": db_mt,
"wal_mt": wal_mt0,
"path": decrypted_path.display().to_string(),
}
}))
.unwrap();
std::fs::write(&mtime_file, payload).unwrap();
let mut all_keys = HashMap::new();
all_keys.insert(rel_key.clone(), FAKE_KEY_HEX.to_string());
let cache = DbCache::with_dirs(db_dir, cache_dir, mtime_file, all_keys)
.await
.unwrap();
// 第一次:完全 hit
let p1 = cache.get(&rel_key).await.unwrap().expect("first get hits");
assert_eq!(p1, decrypted_path);
assert_eq!(
std::fs::read(&decrypted_path).unwrap(),
ORIGINAL_CACHED_BYTES
);
// bump WAL mtime重写仍 31 bytesapply_wal 仍 noop
std::thread::sleep(std::time::Duration::from_millis(20));
std::fs::write(&wal_path, [0xffu8; 31]).unwrap();
let wal_mt1 = mtime_nanos(&wal_path);
assert_ne!(wal_mt0, wal_mt1, "rewriting WAL should bump mtime");
// 第二次WAL 增量路径
// 如果错误地走 full_decrypt → cached file 大小会被重写为 ≥ PAGE_SZ
let p2 = cache
.get(&rel_key)
.await
.unwrap()
.expect("WAL-incremental path should produce path");
assert_eq!(p2, decrypted_path);
let body = std::fs::read(&decrypted_path).unwrap();
assert_eq!(
body, ORIGINAL_CACHED_BYTES,
"WAL-incremental should NOT rewrite cached file"
);
}
#[tokio::test]
async fn db_mtime_change_triggers_full_decrypt() {
let (cache, db_path, decrypted_path, _mtime_file, rel_key) =
setup_seeded_cache("dbchange").await;
// bump 主 .db 的 mtime重写一份不同 bytes
std::thread::sleep(std::time::Duration::from_millis(20));
std::fs::write(&db_path, b"different fake encrypted bytes").unwrap();
assert_ne!(
mtime_nanos(&db_path),
cache.inner.lock().await.get(&rel_key).unwrap().db_mtime,
"rewriting db file should bump mtime"
);
// 走 full_decrypt 路径 → fake key 不会让 full_decrypt 失败(它不验证内容),
// 但会把 cached file 重写为 PAGE_SZ 倍数。原始内容是 24 bytes重写后应该 ≥ 4096 bytes。
let p = cache
.get(&rel_key)
.await
.unwrap()
.expect("cache should produce path");
assert_eq!(p, decrypted_path);
let new_size = std::fs::metadata(&decrypted_path).unwrap().len() as usize;
assert!(
new_size >= crate::crypto::PAGE_SZ,
"expected full_decrypt to rewrite cached file to PAGE_SZ multiple, got size={}",
new_size,
);
}
#[tokio::test]
async fn get_with_mode_reports_each_path() {
let root = unique_tmpdir("getwithmode");
let db_dir = root.join("db_storage");
let cache_dir = root.join("cache");
std::fs::create_dir_all(&db_dir).unwrap();
std::fs::create_dir_all(&cache_dir).unwrap();
let rel_key = "message_0.db".to_string();
let db_path = db_dir.join(&rel_key);
std::fs::write(&db_path, b"fake encrypted db").unwrap();
let wal_path = wal_path_for(&db_path);
std::fs::write(&wal_path, [0u8; 31]).unwrap();
let cached_hash = format!("{:x}", md5::compute(rel_key.as_bytes()));
let decrypted_path = cache_dir.join(format!("{}.db", cached_hash));
std::fs::write(&decrypted_path, ORIGINAL_CACHED_BYTES).unwrap();
let db_mt = mtime_nanos(&db_path);
let wal_mt0 = mtime_nanos(&wal_path);
let mtime_file = cache_dir.join("_mtimes.json");
let payload = serde_json::to_string(&serde_json::json!({
&rel_key: {
"db_mt": db_mt,
"wal_mt": wal_mt0,
"path": decrypted_path.display().to_string(),
}
}))
.unwrap();
std::fs::write(&mtime_file, payload).unwrap();
let mut all_keys = HashMap::new();
all_keys.insert(rel_key.clone(), FAKE_KEY_HEX.to_string());
let cache = DbCache::with_dirs(db_dir, cache_dir, mtime_file, all_keys)
.await
.unwrap();
let hit = cache
.get_with_mode(&rel_key)
.await
.unwrap()
.expect("cache should hit");
assert_eq!(hit.path, decrypted_path);
assert_eq!(hit.mode, CacheMode::CacheHit);
std::thread::sleep(std::time::Duration::from_millis(20));
std::fs::write(&wal_path, [0xffu8; 31]).unwrap();
let wal = cache
.get_with_mode(&rel_key)
.await
.unwrap()
.expect("WAL-only change should stay incremental");
assert_eq!(wal.path, decrypted_path);
assert_eq!(wal.mode, CacheMode::WalIncremental);
std::thread::sleep(std::time::Duration::from_millis(20));
std::fs::write(&db_path, b"different bytes").unwrap();
let full = cache
.get_with_mode(&rel_key)
.await
.unwrap()
.expect("db mtime change should trigger full decrypt");
assert_eq!(full.path, decrypted_path);
assert_eq!(full.mode, CacheMode::FullDecrypt);
}
#[tokio::test]
async fn restart_with_wal_change_still_reuses_cached_db_then_applies_wal() {
let root = unique_tmpdir("restart-wal");
let db_dir = root.join("db_storage");
let cache_dir = root.join("cache");
std::fs::create_dir_all(&db_dir).unwrap();
std::fs::create_dir_all(&cache_dir).unwrap();
let rel_key = "message_0.db".to_string();
let db_path = db_dir.join(&rel_key);
std::fs::write(&db_path, b"fake encrypted db").unwrap();
let wal_path = wal_path_for(&db_path);
std::fs::write(&wal_path, [0u8; 31]).unwrap(); // WAL 增量仍是 noop
let cached_hash = format!("{:x}", md5::compute(rel_key.as_bytes()));
let decrypted_path = cache_dir.join(format!("{}.db", cached_hash));
std::fs::write(&decrypted_path, ORIGINAL_CACHED_BYTES).unwrap();
let db_mt = mtime_nanos(&db_path);
let wal_mt0 = mtime_nanos(&wal_path);
let mtime_file = cache_dir.join("_mtimes.json");
let payload = serde_json::to_string(&serde_json::json!({
&rel_key: {
"db_mt": db_mt,
"wal_mt": wal_mt0,
"path": decrypted_path.display().to_string(),
}
}))
.unwrap();
std::fs::write(&mtime_file, payload).unwrap();
// 模拟 daemon 重启前又有新消息写入 WAL
std::thread::sleep(std::time::Duration::from_millis(20));
std::fs::write(&wal_path, [0xffu8; 31]).unwrap();
let wal_mt1 = mtime_nanos(&wal_path);
assert_ne!(wal_mt0, wal_mt1);
let mut all_keys = HashMap::new();
all_keys.insert(rel_key.clone(), FAKE_KEY_HEX.to_string());
let cache = DbCache::with_dirs(db_dir, cache_dir, mtime_file, all_keys)
.await
.unwrap();
let p = cache
.get(&rel_key)
.await
.unwrap()
.expect("cache should reuse persisted DB");
assert_eq!(p, decrypted_path);
let body = std::fs::read(&decrypted_path).unwrap();
assert_eq!(
body, ORIGINAL_CACHED_BYTES,
"restart + WAL-only change should still reuse cached DB and avoid full_decrypt"
);
}
}

269
src/daemon/meta.rs 100644
View File

@ -0,0 +1,269 @@
//! Freshness metadata appended to every q_* response.
//!
//! 背景:`all_keys.json` 是 `wx init` 时的快照。WeChat 在 daemon 启动后随时可能创建
//! 新的 `message_N.db` 分片;如果只信任 init 时收到的 `msg_db_keys` 列表,新分片里
//! 的数据对 daemon 完全不可见 → 调用方拿到的是看似正常但缺数据的结果("stale")。
//!
//! 本模块的职责:
//! 1. 提供 `Meta` 结构体,由各 `q_*` 函数填充后塞进 response顶层 `meta` 字段)。
//! 2. 提供 `discover_unknown_shards(db_dir, msg_db_keys)`:扫描磁盘上当前真实存在的
//! `message/message_*.db` 文件diff 出 daemon 未持有 enc_key 的"未知分片"列表。
//! 3. 集中 `MetaStatus` 的判定规则,避免 8 个 q_* 各自判,规则漂移。
use serde::Serialize;
use std::collections::HashMap;
use std::path::Path;
/// 每条 q_* 响应附带的"新鲜度元数据"。
///
/// 序列化为 JSON 时,所有 `Option` 字段在 `None` 时省略,让最常见的命令调用
/// 输出尽量短重负载字段per_shard_*、shard_paths默认不填由 CLI 层
/// 通过 `--debug-source` 等开关显式请求时才放进来。
#[derive(Debug, Clone, Serialize, Default)]
pub struct Meta {
/// 命中数据中最新一条的 create_timeunix 秒)。
/// `q_history` / `q_search` / `q_new_messages` 等基于 Msg_ 表的查询都应填。
/// `q_sessions` / `q_unread` 这类基于 SessionTable 的查询填会话维度的最新 ts。
#[serde(skip_serializing_if = "Option::is_none")]
pub chat_latest_timestamp: Option<i64>,
/// 上面那条最新消息所在的分片 rel_key`message/message_3.db`)。
/// 让 agent 一眼看出"当前命中的最新数据来自哪个分片"。
#[serde(skip_serializing_if = "Option::is_none")]
pub chat_latest_db: Option<String>,
/// 该 chat 在 `session.db.SessionTable.last_timestamp` 里的值(如果可读)。
/// 这是 WeChat 自己写的"最近一条消息时间",与上面 `chat_latest_timestamp` 比较
/// 即可发现"session 说有更新但 history 没读到" → 漏分片。
#[serde(skip_serializing_if = "Option::is_none")]
pub session_last_timestamp: Option<i64>,
/// 本次查询实际遍历的分片数(即 `names.msg_db_keys.len()` 的子集;包括命中 0 行的)。
pub shards_scanned: usize,
/// 本次查询里至少返回了 1 行的分片数。
pub shards_hit: usize,
/// 磁盘上存在但 daemon 没有 enc_key 的分片 rel_key 列表。
/// 非空 ⇒ `wx init` 之后 WeChat 又分裂了新分片 → 必须重跑 `wx init`。
pub unknown_shards: Vec<String>,
/// 由上述字段派生出的总体状态CLI / agent 主要看这一个。
pub status: MetaStatus,
// 重负载/调试字段默认不填CLI 层显式开启
#[serde(skip_serializing_if = "Option::is_none")]
pub per_shard_latest: Option<HashMap<String, i64>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub cache_mode_per_shard: Option<HashMap<String, String>>,
#[serde(skip_serializing_if = "Option::is_none")]
pub shard_paths: Option<HashMap<String, String>>,
}
#[derive(Debug, Clone, Copy, Serialize, PartialEq, Eq, Default)]
#[serde(rename_all = "snake_case")]
pub enum MetaStatus {
#[default]
Ok,
/// `session.db` 的最新时间明显领先于本次消息查询结果,说明数据可能过期或不完整。
PossiblyStale,
/// 最强信号:磁盘上出现 daemon 不认识的新分片,通常必须重跑 `wx init --force`。
PossiblyStaleUnknownShards,
/// 调用方主动传了 `since` / `until` / `offset` 等窗口条件,结果天然是局部视图。
Windowed,
}
/// session 领先 history 多少秒就报 `PossiblyStale`。
///
/// 24h 的取值是故意保守的:活跃群聊/私聊很少会整整一天没有新消息,
/// 超过这个窗口就值得显式提醒 agent 不要把结果当成“当前最新状态”。
pub const STALE_THRESHOLD_SECS: i64 = 24 * 3600;
/// 统一 freshness status 的优先级:
/// 1. `unknown_shards` 非空daemon 整体视图已经过期,优先返回 `PossiblyStaleUnknownShards`
/// 2. `windowed=true`:调用方本来就在看局部窗口,不参与 stale 推导
/// 3. `session_last - chat_latest > STALE_THRESHOLD_SECS`:返回 `PossiblyStale`
/// 4. 其他情况:`Ok`
pub fn derive_status(
chat_latest: Option<i64>,
session_last: Option<i64>,
unknown_shards: &[String],
windowed: bool,
) -> MetaStatus {
if !unknown_shards.is_empty() {
return MetaStatus::PossiblyStaleUnknownShards;
}
if windowed {
return MetaStatus::Windowed;
}
match (chat_latest, session_last) {
(Some(c), Some(s)) if s - c > STALE_THRESHOLD_SECS => MetaStatus::PossiblyStale,
_ => MetaStatus::Ok,
}
}
/// 扫描 `<db_dir>/message/` 下真实存在的 `message_*.db`diff 出 daemon 当前没有 key
/// 的未知分片。
///
/// 契约:
/// - 返回值一律是 `/` 分隔的 rel_key如 `message/message_3.db`),与 `all_keys.json` 对齐
/// - 结果按字典序排序,方便测试和 CLI 稳定显示
/// - 排除 `_fts*` / `_resource*`,因为它们是索引/附件库,不属于消息分片真相
pub fn discover_unknown_shards(db_dir: &Path, known: &[String]) -> Vec<String> {
let known_set: std::collections::HashSet<String> =
known.iter().map(|k| k.replace('\\', "/")).collect();
let msg_dir = db_dir.join("message");
let entries = match std::fs::read_dir(&msg_dir) {
Ok(it) => it,
Err(_) => return Vec::new(),
};
let mut unknown: Vec<String> = Vec::new();
for entry in entries.flatten() {
let name = entry.file_name();
let Some(name_str) = name.to_str() else {
continue;
};
if !is_message_shard(name_str) {
continue;
}
let rel = format!("message/{}", name_str);
if !known_set.contains(&rel) {
unknown.push(rel);
}
}
unknown.sort();
unknown
}
fn is_message_shard(file_name: &str) -> bool {
if !file_name.starts_with("message_") || !file_name.ends_with(".db") {
return false;
}
if file_name.contains("_fts") || file_name.contains("_resource") {
return false;
}
let stem = &file_name["message_".len()..file_name.len() - ".db".len()];
!stem.is_empty() && stem.chars().all(|c| c.is_ascii_digit())
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn is_message_shard_accepts_normal_shards() {
assert!(is_message_shard("message_0.db"));
assert!(is_message_shard("message_12.db"));
}
#[test]
fn is_message_shard_rejects_fts_and_resource() {
assert!(!is_message_shard("message_0_fts.db"));
assert!(!is_message_shard("message_fts.db"));
assert!(!is_message_shard("message_0_resource.db"));
assert!(!is_message_shard("message_resource.db"));
}
#[test]
fn is_message_shard_rejects_non_digits() {
assert!(!is_message_shard("message_a.db"));
assert!(!is_message_shard("message_.db"));
assert!(!is_message_shard("session.db"));
assert!(!is_message_shard("message_0.db.bak"));
}
#[test]
fn discover_unknown_shards_finds_disk_only_shards() {
let dir = tempdir();
let msg_dir = dir.join("message");
std::fs::create_dir_all(&msg_dir).unwrap();
for f in [
"message_0.db",
"message_1.db",
"message_2.db",
"message_0_fts.db",
] {
std::fs::write(msg_dir.join(f), b"").unwrap();
}
let known = vec![
"message/message_0.db".to_string(),
"message/message_1.db".to_string(),
];
let unknown = discover_unknown_shards(&dir, &known);
assert_eq!(unknown, vec!["message/message_2.db".to_string()]);
}
#[test]
fn discover_unknown_shards_normalizes_backslash_in_known_keys() {
let dir = tempdir();
let msg_dir = dir.join("message");
std::fs::create_dir_all(&msg_dir).unwrap();
std::fs::write(msg_dir.join("message_0.db"), b"").unwrap();
let known = vec!["message\\message_0.db".to_string()];
assert!(discover_unknown_shards(&dir, &known).is_empty());
}
#[test]
fn discover_unknown_shards_returns_empty_when_message_dir_missing() {
let dir = tempdir();
assert!(discover_unknown_shards(&dir, &[]).is_empty());
}
#[test]
fn derive_status_unknown_shards_overrides_windowed() {
let unknown = vec!["message/message_3.db".to_string()];
assert_eq!(
derive_status(Some(100), Some(100), &unknown, true),
MetaStatus::PossiblyStaleUnknownShards
);
}
#[test]
fn derive_status_windowed_when_user_paginates() {
assert_eq!(
derive_status(Some(100), Some(999_999), &[], true),
MetaStatus::Windowed,
);
}
#[test]
fn derive_status_possibly_stale_when_session_far_ahead() {
let chat = Some(1_000_000);
let session = Some(1_000_000 + STALE_THRESHOLD_SECS + 1);
assert_eq!(
derive_status(chat, session, &[], false),
MetaStatus::PossiblyStale
);
}
#[test]
fn derive_status_ok_when_within_threshold() {
let chat = Some(1_000_000);
let session = Some(1_000_000 + STALE_THRESHOLD_SECS - 1);
assert_eq!(derive_status(chat, session, &[], false), MetaStatus::Ok);
}
#[test]
fn derive_status_ok_when_either_side_unknown() {
assert_eq!(
derive_status(None, Some(999_999_999), &[], false),
MetaStatus::Ok
);
assert_eq!(derive_status(Some(1), None, &[], false), MetaStatus::Ok);
assert_eq!(derive_status(None, None, &[], false), MetaStatus::Ok);
}
fn tempdir() -> std::path::PathBuf {
let pid = std::process::id();
let nanos = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_nanos();
let p = std::env::temp_dir().join(format!("wx-cli-meta-test-{}-{}", pid, nanos));
std::fs::create_dir_all(&p).unwrap();
p
}
}

View File

@ -1,4 +1,5 @@
pub mod cache; pub mod cache;
pub mod meta;
pub mod query; pub mod query;
pub mod server; pub mod server;
@ -8,6 +9,39 @@ use std::sync::Arc;
use crate::config; use crate::config;
fn normalized_rel_key(rel_key: &str) -> String {
rel_key.replace('\\', "/")
}
fn is_msg_db_key(rel_key: &str) -> bool {
let rel_key = normalized_rel_key(rel_key);
rel_key.starts_with("message/message_")
&& rel_key.ends_with(".db")
&& !rel_key.contains("_fts")
&& !rel_key.contains("_resource")
}
fn is_biz_msg_db_key(rel_key: &str) -> bool {
let rel_key = normalized_rel_key(rel_key);
rel_key.starts_with("message/biz_message_")
&& rel_key.ends_with(".db")
&& !rel_key.contains("_fts")
&& !rel_key.contains("_resource")
}
fn collect_db_keys(
all_keys: &HashMap<String, String>,
predicate: fn(&str) -> bool,
) -> Vec<String> {
let mut keys: Vec<String> = all_keys
.keys()
.filter(|k| predicate(k))
.cloned()
.collect();
keys.sort();
keys
}
/// daemon 入口 /// daemon 入口
/// ///
/// 当 WX_DAEMON_MODE 环境变量设置时main() 调用此函数 /// 当 WX_DAEMON_MODE 环境变量设置时main() 调用此函数
@ -48,17 +82,8 @@ async fn async_run() -> Result<()> {
let db = Arc::new(cache::DbCache::new(cfg.db_dir.clone(), all_keys.clone()).await?); let db = Arc::new(cache::DbCache::new(cfg.db_dir.clone(), all_keys.clone()).await?);
// 收集消息 DB 列表 // 收集消息 DB 列表
let msg_db_keys: Vec<String> = all_keys let msg_db_keys = collect_db_keys(&all_keys, is_msg_db_key);
.keys() let biz_msg_db_keys = collect_db_keys(&all_keys, is_biz_msg_db_key);
.filter(|k| {
let k = k.replace('\\', "/");
k.contains("message/message_")
&& k.ends_with(".db")
&& !k.contains("_fts")
&& !k.contains("_resource")
})
.cloned()
.collect();
// 预热:加载联系人 + 解密 session.db // 预热:加载联系人 + 解密 session.db
eprintln!("[daemon] 预热..."); eprintln!("[daemon] 预热...");
@ -68,11 +93,13 @@ async fn async_run() -> Result<()> {
map: HashMap::new(), map: HashMap::new(),
md5_to_uname: HashMap::new(), md5_to_uname: HashMap::new(),
msg_db_keys: Vec::new(), msg_db_keys: Vec::new(),
biz_msg_db_keys: Vec::new(),
verify_flags: HashMap::new(), verify_flags: HashMap::new(),
} }
}); });
let mut names = names_raw; let mut names = names_raw;
names.msg_db_keys = msg_db_keys; names.msg_db_keys = msg_db_keys;
names.biz_msg_db_keys = biz_msg_db_keys;
let _ = db.get("session/session.db").await; let _ = db.get("session/session.db").await;
let _ = db.get("sns/sns.db").await; let _ = db.get("sns/sns.db").await;
@ -148,3 +175,28 @@ fn cleanup_ipc_files() {
let _ = std::fs::remove_file(config::sock_path()); let _ = std::fs::remove_file(config::sock_path());
let _ = std::fs::remove_file(config::pid_path()); let _ = std::fs::remove_file(config::pid_path());
} }
#[cfg(test)]
mod tests {
use super::{is_biz_msg_db_key, is_msg_db_key};
#[test]
fn message_db_key_filter_ignores_biz_and_auxiliary_files() {
assert!(is_msg_db_key("message/message_0.db"));
assert!(is_msg_db_key("message\\message_12.db"));
assert!(!is_msg_db_key("message/biz_message_0.db"));
assert!(!is_msg_db_key("message/message_0.db-wal"));
assert!(!is_msg_db_key("message/message_0_fts.db"));
assert!(!is_msg_db_key("message/message_0_resource.db"));
}
#[test]
fn biz_message_db_key_filter_matches_only_biz_shards() {
assert!(is_biz_msg_db_key("message/biz_message_0.db"));
assert!(is_biz_msg_db_key("message\\biz_message_3.db"));
assert!(!is_biz_msg_db_key("message/message_0.db"));
assert!(!is_biz_msg_db_key("message/biz_message_0.db-wal"));
assert!(!is_biz_msg_db_key("message/biz_message_0_fts.db"));
assert!(!is_biz_msg_db_key("message/biz_message_0_resource.db"));
}
}

File diff suppressed because it is too large Load Diff

View File

@ -2,15 +2,12 @@ use anyhow::Result;
use std::sync::Arc; use std::sync::Arc;
use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader}; use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
use crate::ipc::{Request, Response};
use super::cache::DbCache; use super::cache::DbCache;
use super::query::Names; use super::query::Names;
use crate::ipc::{Request, Response};
/// 启动 IPC serverUnix socket / Windows named pipe /// 启动 IPC serverUnix socket / Windows named pipe
pub async fn serve( pub async fn serve(db: Arc<DbCache>, names: Arc<tokio::sync::RwLock<Arc<Names>>>) -> Result<()> {
db: Arc<DbCache>,
names: Arc<tokio::sync::RwLock<Arc<Names>>>,
) -> Result<()> {
#[cfg(unix)] #[cfg(unix)]
serve_unix(db, names).await?; serve_unix(db, names).await?;
#[cfg(windows)] #[cfg(windows)]
@ -19,10 +16,7 @@ pub async fn serve(
} }
#[cfg(unix)] #[cfg(unix)]
async fn serve_unix( async fn serve_unix(db: Arc<DbCache>, names: Arc<tokio::sync::RwLock<Arc<Names>>>) -> Result<()> {
db: Arc<DbCache>,
names: Arc<tokio::sync::RwLock<Arc<Names>>>,
) -> Result<()> {
use tokio::net::UnixListener; use tokio::net::UnixListener;
let sock_path = crate::config::sock_path(); let sock_path = crate::config::sock_path();
@ -88,9 +82,7 @@ async fn serve_windows(
db: Arc<DbCache>, db: Arc<DbCache>,
names: Arc<tokio::sync::RwLock<Arc<Names>>>, names: Arc<tokio::sync::RwLock<Arc<Names>>>,
) -> Result<()> { ) -> Result<()> {
use interprocess::local_socket::{ use interprocess::local_socket::{tokio::prelude::*, GenericNamespaced, ListenerOptions};
tokio::prelude::*, GenericNamespaced, ListenerOptions,
};
// interprocess 的 GenericNamespaced 在 Windows 上会自动拼接 `\\.\pipe\` 前缀, // interprocess 的 GenericNamespaced 在 Windows 上会自动拼接 `\\.\pipe\` 前缀,
// 这里必须传相对名client 端用 `\\.\pipe\wx-cli-daemon` 直接打开可以对上 // 这里必须传相对名client 端用 `\\.\pipe\wx-cli-daemon` 直接打开可以对上
@ -141,13 +133,9 @@ async fn handle_connection_windows(
Ok(()) Ok(())
} }
async fn dispatch( async fn dispatch(req: Request, db: &DbCache, names: &tokio::sync::RwLock<Arc<Names>>) -> Response {
req: Request,
db: &DbCache,
names: &tokio::sync::RwLock<Arc<Names>>,
) -> Response {
use crate::ipc::Request::*;
use super::query; use super::query;
use crate::ipc::Request::*;
// 取 guard → O(1) clone Arc → 立即 drop 锁。后续 await 期间不持有锁, // 取 guard → O(1) clone Arc → 立即 drop 锁。后续 await 期间不持有锁,
// 多个并发 IPC 请求可以真正并行。Names 本身不可变(由 daemon 启动时 // 多个并发 IPC 请求可以真正并行。Names 本身不可变(由 daemon 启动时
@ -159,20 +147,66 @@ async fn dispatch(
match req { match req {
Ping => Response::ok(serde_json::json!({ "pong": true })), Ping => Response::ok(serde_json::json!({ "pong": true })),
Sessions { limit } => { Sessions {
match query::q_sessions(db, &names_arc, limit).await { limit,
with_meta,
debug_source,
} => match query::q_sessions(db, &names_arc, limit, with_meta, debug_source).await {
Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()),
},
History {
chat,
limit,
offset,
since,
until,
msg_type,
with_meta,
debug_source,
} => {
match query::q_history(
db,
&names_arc,
&chat,
limit,
offset,
since,
until,
msg_type,
with_meta,
debug_source,
)
.await
{
Ok(v) => Response::ok(v), Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} }
} }
History { chat, limit, offset, since, until, msg_type } => { Search {
match query::q_history(db, &names_arc, &chat, limit, offset, since, until, msg_type).await { keyword,
Ok(v) => Response::ok(v), chats,
Err(e) => Response::err(e.to_string()), limit,
} since,
} until,
Search { keyword, chats, limit, since, until, msg_type } => { msg_type,
match query::q_search(db, &names_arc, &keyword, chats, limit, since, until, msg_type).await { with_meta,
debug_source,
} => {
match query::q_search(
db,
&names_arc,
&keyword,
chats,
limit,
since,
until,
msg_type,
with_meta,
debug_source,
)
.await
{
Ok(v) => Response::ok(v), Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} }
@ -183,62 +217,145 @@ async fn dispatch(
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} }
} }
Unread { limit, filter } => { Unread {
match query::q_unread(db, &names_arc, limit, filter).await { limit,
filter,
with_meta,
debug_source,
} => match query::q_unread(db, &names_arc, limit, filter, with_meta, debug_source).await {
Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()),
},
Members { chat } => match query::q_members(db, &names_arc, &chat).await {
Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()),
},
NewMessages {
state,
limit,
with_meta,
debug_source,
} => {
match query::q_new_messages(db, &names_arc, state, limit, with_meta, debug_source).await
{
Ok(v) => Response::ok(v), Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} }
} }
Members { chat } => { Favorites {
match query::q_members(db, &names_arc, &chat).await { limit,
fav_type,
query,
} => match query::q_favorites(db, limit, fav_type, query).await {
Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()),
},
Stats {
chat,
since,
until,
with_meta,
debug_source,
} => {
match query::q_stats(db, &names_arc, &chat, since, until, with_meta, debug_source).await
{
Ok(v) => Response::ok(v), Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} }
} }
NewMessages { state, limit } => { SnsNotifications {
match query::q_new_messages(db, &names_arc, state, limit).await { limit,
since,
until,
include_read,
} => {
match query::q_sns_notifications(db, &names_arc, limit, since, until, include_read)
.await
{
Ok(v) => Response::ok(v), Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} }
} }
Favorites { limit, fav_type, query } => { SnsFeed {
match query::q_favorites(db, limit, fav_type, query).await { limit,
since,
until,
user,
} => match query::q_sns_feed(db, &names_arc, limit, since, until, user.as_deref()).await {
Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()),
},
SnsSearch {
keyword,
limit,
since,
until,
user,
} => {
match query::q_sns_search(
db,
&names_arc,
&keyword,
limit,
since,
until,
user.as_deref(),
)
.await
{
Ok(v) => Response::ok(v), Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} }
} }
Stats { chat, since, until } => { ReloadConfig => Response::ok(serde_json::json!({ "reloading": true })),
match query::q_stats(db, &names_arc, &chat, since, until).await { BizArticles {
limit,
account,
since,
until,
unread,
} => {
match query::q_biz_articles(db, &names_arc, limit, account, since, until, unread).await
{
Ok(v) => Response::ok(v), Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} }
} }
SnsNotifications { limit, since, until, include_read } => { Attachments {
match query::q_sns_notifications(db, &names_arc, limit, since, until, include_read).await { chat,
kinds,
limit,
offset,
since,
until,
with_meta,
debug_source,
} => {
match query::q_attachments(
db,
&names_arc,
&chat,
kinds,
limit,
offset,
since,
until,
with_meta,
debug_source,
)
.await
{
Ok(v) => Response::ok(v), Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} }
} }
SnsFeed { limit, since, until, user } => { Extract {
match query::q_sns_feed(db, &names_arc, limit, since, until, user.as_deref()).await { attachment_id,
output,
overwrite,
} => match query::q_extract(db, &names_arc, &attachment_id, &output, overwrite).await {
Ok(v) => Response::ok(v), Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()), Err(e) => Response::err(e.to_string()),
} },
}
SnsSearch { keyword, limit, since, until, user } => {
match query::q_sns_search(db, &names_arc, &keyword, limit, since, until, user.as_deref()).await {
Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()),
}
}
ReloadConfig => {
Response::ok(serde_json::json!({ "reloading": true }))
}
BizArticles { limit, account, since, until, unread } => {
match query::q_biz_articles(db, &names_arc, limit, account, since, until, unread).await {
Ok(v) => Response::ok(v),
Err(e) => Response::err(e.to_string()),
}
}
} }
} }

View File

@ -1,6 +1,6 @@
use std::collections::HashMap;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use serde_json::Value; use serde_json::Value;
use std::collections::HashMap;
/// CLI 向 daemon 发送的请求(换行符分隔 JSON与 Python 版兼容) /// CLI 向 daemon 发送的请求(换行符分隔 JSON与 Python 版兼容)
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
@ -10,6 +10,10 @@ pub enum Request {
Sessions { Sessions {
#[serde(default = "default_limit_20")] #[serde(default = "default_limit_20")]
limit: usize, limit: usize,
#[serde(default, skip_serializing_if = "is_false")]
with_meta: bool,
#[serde(default, skip_serializing_if = "is_false")]
debug_source: bool,
}, },
History { History {
chat: String, chat: String,
@ -23,6 +27,10 @@ pub enum Request {
until: Option<i64>, until: Option<i64>,
#[serde(skip_serializing_if = "Option::is_none")] #[serde(skip_serializing_if = "Option::is_none")]
msg_type: Option<i64>, msg_type: Option<i64>,
#[serde(default, skip_serializing_if = "is_false")]
with_meta: bool,
#[serde(default, skip_serializing_if = "is_false")]
debug_source: bool,
}, },
Search { Search {
keyword: String, keyword: String,
@ -36,6 +44,10 @@ pub enum Request {
until: Option<i64>, until: Option<i64>,
#[serde(skip_serializing_if = "Option::is_none")] #[serde(skip_serializing_if = "Option::is_none")]
msg_type: Option<i64>, msg_type: Option<i64>,
#[serde(default, skip_serializing_if = "is_false")]
with_meta: bool,
#[serde(default, skip_serializing_if = "is_false")]
debug_source: bool,
}, },
Contacts { Contacts {
#[serde(skip_serializing_if = "Option::is_none")] #[serde(skip_serializing_if = "Option::is_none")]
@ -49,6 +61,10 @@ pub enum Request {
/// 按会话类型过滤private / group / official / folded / all支持多选 /// 按会话类型过滤private / group / official / folded / all支持多选
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
filter: Option<Vec<String>>, filter: Option<Vec<String>>,
#[serde(default, skip_serializing_if = "is_false")]
with_meta: bool,
#[serde(default, skip_serializing_if = "is_false")]
debug_source: bool,
}, },
Members { Members {
chat: String, chat: String,
@ -60,6 +76,10 @@ pub enum Request {
state: Option<HashMap<String, i64>>, state: Option<HashMap<String, i64>>,
#[serde(default = "default_limit_200")] #[serde(default = "default_limit_200")]
limit: usize, limit: usize,
#[serde(default, skip_serializing_if = "is_false")]
with_meta: bool,
#[serde(default, skip_serializing_if = "is_false")]
debug_source: bool,
}, },
Stats { Stats {
chat: String, chat: String,
@ -67,6 +87,10 @@ pub enum Request {
since: Option<i64>, since: Option<i64>,
#[serde(skip_serializing_if = "Option::is_none")] #[serde(skip_serializing_if = "Option::is_none")]
until: Option<i64>, until: Option<i64>,
#[serde(default, skip_serializing_if = "is_false")]
with_meta: bool,
#[serde(default, skip_serializing_if = "is_false")]
debug_source: bool,
}, },
Favorites { Favorites {
#[serde(default = "default_limit_50")] #[serde(default = "default_limit_50")]
@ -102,7 +126,7 @@ pub enum Request {
#[serde(skip_serializing_if = "Option::is_none")] #[serde(skip_serializing_if = "Option::is_none")]
user: Option<String>, user: Option<String>,
}, },
/// 查询公众号文章推送biz_message_0.db /// 查询公众号文章推送biz_message_*.db 分片
BizArticles { BizArticles {
#[serde(default = "default_limit_50")] #[serde(default = "default_limit_50")]
limit: usize, limit: usize,
@ -131,9 +155,38 @@ pub enum Request {
}, },
/// 重新加载配置和密钥init --force 后 daemon 不会自动重读) /// 重新加载配置和密钥init --force 后 daemon 不会自动重读)
ReloadConfig, ReloadConfig,
/// 列出某个会话里的图片附件
/// 输出每条带 `attachment_id`(不透明 base64url 句柄),传给 `Extract` 时取回本体
Attachments {
chat: String,
/// 类型过滤:当前仅支持 image
#[serde(default, skip_serializing_if = "Option::is_none")]
kinds: Option<Vec<String>>,
#[serde(default = "default_limit_50")]
limit: usize,
#[serde(default)]
offset: usize,
#[serde(skip_serializing_if = "Option::is_none")]
since: Option<i64>,
#[serde(skip_serializing_if = "Option::is_none")]
until: Option<i64>,
#[serde(default, skip_serializing_if = "is_false")]
with_meta: bool,
#[serde(default, skip_serializing_if = "is_false")]
debug_source: bool,
},
/// 提取(解密)单个附件的本体到指定路径
Extract {
/// `Attachments` 返回的不透明 ID
attachment_id: String,
/// 写入的绝对路径daemon 直接写盘,不经 socket 传 binary
output: String,
/// 已存在时是否覆盖
#[serde(default)]
overwrite: bool,
},
} }
/// daemon 的响应 /// daemon 的响应
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Response { pub struct Response {
@ -146,11 +199,19 @@ pub struct Response {
impl Response { impl Response {
pub fn ok(data: Value) -> Self { pub fn ok(data: Value) -> Self {
Self { ok: true, error: None, data } Self {
ok: true,
error: None,
data,
}
} }
pub fn err(msg: impl Into<String>) -> Self { pub fn err(msg: impl Into<String>) -> Self {
Self { ok: false, error: Some(msg.into()), data: Value::Null } Self {
ok: false,
error: Some(msg.into()),
data: Value::Null,
}
} }
pub fn to_json_line(&self) -> anyhow::Result<String> { pub fn to_json_line(&self) -> anyhow::Result<String> {
@ -159,6 +220,15 @@ impl Response {
} }
} }
fn default_limit_20() -> usize { 20 } fn default_limit_20() -> usize {
fn default_limit_50() -> usize { 50 } 20
fn default_limit_200() -> usize { 200 } }
fn default_limit_50() -> usize {
50
}
fn default_limit_200() -> usize {
200
}
fn is_false(v: &bool) -> bool {
!*v
}

View File

@ -4,6 +4,7 @@ mod crypto;
mod scanner; mod scanner;
mod daemon; mod daemon;
mod cli; mod cli;
mod attachment;
fn main() { fn main() {
if std::env::var("WX_DAEMON_MODE").is_ok() { if std::env::var("WX_DAEMON_MODE").is_ok() {