huangxun375-stack
diff --git a/‎examples/openclaw-plugin/tests/e2e/test-archive-expand.py‎
Lines changed: 27 additions & 1 deletion b/‎examples/openclaw-plugin/tests/e2e/test-archive-expand.py‎
Lines changed: 27 additions & 1 deletion
@@ -132,7 +132,33 @@
   在 Phase 4 前重启 Gateway 清除工作记忆，强制走归档展开路径。
 
 ================================================================================
-六、预期结果
+六、已知限制
+================================================================================
+
+  1. LLM 是否调用 ov_archive_expand:
+     不同模型对工具调用的倾向性不同。如果模型直接从 archive overview 摘要
+     中推测答案而不展开归档，关键词可能命中（摘要恰好包含）也可能不命中。
+     使用 --gateway-restart-cmd 可强制清除工作记忆，迫使走归档展开路径。
+
+  2. 关键词精确匹配:
+     数字格式差异可能导致匹配失败（如 "12000" vs "12,000" vs "1.2万"）。
+     Q4 的 "12000" 在实际测试中因 LLM 输出 "12,000" 而未命中，但整体命中率
+     仍达 67% 超过 50% 阈值。
+
+  3. 测试耗时:
+     完整测试需要 32 轮对话 + 验证 + 追问，约 10-15 分钟。如需快速验证，可
+     使用 --phase expand 单独跑追问阶段（前提是已有归档数据）。
+
+  4. 对话顺序依赖:
+     4 批对话必须按顺序执行（Phase 1 → 2a → 2b → 2c），因为后续批次的归档
+     编号依赖前序批次。不能单独跑 chat2 而跳过 chat1。
+
+  5. 环境要求:
+     Gateway 必须配置 OpenViking 插件且启用 ov_archive_expand 工具定义，
+     否则 LLM 无法调用归档展开。
+
+================================================================================
+七、预期结果
 ================================================================================
 
   15/15 断言全部通过: