feat: 优化思维导图构建接口,支持增量构建和更新#762
Conversation
- 新增 GET /mindmap/diff 接口检测文件变更(新增/删除/修改) - POST /mindmap/generate 新增 incremental 参数支持增量更新 - 纯删除场景无需 AI 调用(递归树手术),新增文件时 AI 整合进现有分类结构 - 文件删除时自动清理导图中的失效引用(兼容旧数据) - 前端导图 Tab 新增增量更新按钮和变更数量 badge - KnowledgeBase 模型新增 mindmap_file_ids 和 mindmap_metadata 字段
There was a problem hiding this comment.
Code Review
This pull request introduces incremental building and updating of mindmaps for knowledge bases, allowing users to update existing mindmaps when files are added or removed. It adds database columns to track file associations, implements recursive tree pruning for file deletions (avoiding AI calls), and integrates AI-driven categorization for new files. The frontend is also updated with a new "Incremental Update" button and a change badge. The review feedback highlights three key areas for improvement: addressing compatibility with legacy mindmap data that lacks tracked file IDs, avoiding in-place dictionary mutations that could bypass SQLAlchemy's dirty tracking, and resolving an N+1 database query/update performance issue during batch document deletions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if not mindmap_data or not mindmap_file_ids: | ||
| added_files = [ | ||
| {"file_id": fid, "filename": info.get("filename", ""), "type": info.get("type", "")} | ||
| for fid, info in current_files.items() | ||
| ] | ||
| return { | ||
| "has_mindmap": mindmap_data is not None, | ||
| "tracked_files": list(mindmap_file_ids.keys()) if mindmap_file_ids else [], | ||
| "current_files": list(current_files.keys()), | ||
| "added_files": added_files, | ||
| "removed_file_ids": [], | ||
| "unchanged_count": 0, | ||
| "needs_update": len(added_files) > 0, | ||
| } |
There was a problem hiding this comment.
在 detect_mindmap_changes 中,如果存在历史思维导图数据(mindmap_data 不为空)但缺少追踪的 mindmap_file_ids(例如在本次升级前生成的导图),代码会直接将所有当前文件视为“新增文件”(added_files),这会导致增量更新时 AI 重新整合所有文件,造成重复或超出 Token 限制。建议在 mindmap_file_ids 为空时,通过已有的 _collect_leaf_filenames 收集导图中的叶子节点文件名,并与当前文件进行匹配,以此自动重建 mindmap_file_ids 映射,确保平滑兼容旧数据。
# 兼容旧数据:如果存在思维导图但缺少追踪的 file_ids,尝试通过叶子节点文件名反向重建映射
reconstructed_file_ids = {}
if mindmap_data and not mindmap_file_ids:
leaf_filenames = _collect_leaf_filenames(mindmap_data)
for fid, info in current_files.items():
fname = info.get("filename", "")
if fname in leaf_filenames:
reconstructed_file_ids[fid] = fname
mindmap_file_ids = reconstructed_file_ids
if not mindmap_data or not mindmap_file_ids:
added_files = [
{"file_id": fid, "filename": info.get("filename", ""), "type": info.get("type", "")}
for fid, info in current_files.items()
]
return {
"has_mindmap": mindmap_data is not None,
"tracked_files": list(mindmap_file_ids.keys()) if mindmap_file_ids else [],
"current_files": list(current_files.keys()),
"added_files": added_files,
"removed_file_ids": [],
"unchanged_count": 0,
"needs_update": len(added_files) > 0,
}| def remove_files_from_mindmap(mindmap_data: dict[str, Any], removed_filenames: set[str]) -> dict[str, Any]: | ||
| """从思维导图树中移除指定文件名的叶子节点,无需 AI 调用。""" | ||
| if not removed_filenames: | ||
| return mindmap_data | ||
|
|
||
| root_name = mindmap_data.get("content", "") | ||
| result = _prune_mindmap_node(mindmap_data, removed_filenames, root_name) | ||
| return result if result is not None else {"content": root_name, "children": []} |
There was a problem hiding this comment.
remove_files_from_mindmap 直接对传入的 mindmap_data 字典进行了原地修改(in-place mutation)。在 SQLAlchemy 中,直接修改已加载的 JSON/JSONB 字段属性(如字典内部的值)而没有重新赋值新对象,可能会导致 SQLAlchemy 无法检测到该字段已被修改(dirty tracking 失效),从而在 commit 时不会触发数据库更新。此外,原地修改也可能对其他引用该对象的上下文产生副作用。建议在修改前使用 copy.deepcopy 进行深拷贝。
| def remove_files_from_mindmap(mindmap_data: dict[str, Any], removed_filenames: set[str]) -> dict[str, Any]: | |
| """从思维导图树中移除指定文件名的叶子节点,无需 AI 调用。""" | |
| if not removed_filenames: | |
| return mindmap_data | |
| root_name = mindmap_data.get("content", "") | |
| result = _prune_mindmap_node(mindmap_data, removed_filenames, root_name) | |
| return result if result is not None else {"content": root_name, "children": []} | |
| def remove_files_from_mindmap(mindmap_data: dict[str, Any], removed_filenames: set[str]) -> dict[str, Any]: | |
| """从思维导图树中移除指定文件名的叶子节点,无需 AI 调用。""" | |
| if not removed_filenames: | |
| return mindmap_data | |
| import copy | |
| mindmap_copy = copy.deepcopy(mindmap_data) | |
| root_name = mindmap_copy.get("content", "") | |
| result = _prune_mindmap_node(mindmap_copy, removed_filenames, root_name) | |
| return result if result is not None else {"content": root_name, "children": []} |
| # 先清理思维导图中对该文件的引用 | ||
| removed_filename = file_meta_info.get("meta", {}).get("filename", "") | ||
| if removed_filename: | ||
| await remove_file_from_mindmap(kb_id, doc_id, removed_filename) |
- 旧数据兼容:detect_mindmap_changes 在 mindmap_file_ids 为空时通过叶子节点反向重建映射 - 深拷贝:remove_files_from_mindmap 使用 copy.deepcopy 避免原地修改导致 SQLAlchemy dirty tracking 失效 - 批量删除优化:batch_delete_documents 先收集所有文件名,循环结束后单次清理导图(消除 N+1 问题)
|
看了一下,主要有三个点需要调整:
|
1. 旧导图兼容bug:kb.mindmap_file_ids 为空时从叶子节点反推文件映射, 避免原有文件下次被误判为新增 2. 移除删除接口中的导图清理调用,导图是否过期由 diff 接口判断, 用户点击增量更新时再统一处理 3. changelog 条目移至 v0.7.1 开发记录顶部,移除文件删除自动清理导图描述
|
已修复你提出的三个问题,push 到了 feature/mindmap-incremental-update 分支:
单元测试全部通过,ruff format/lint 检查通过。 |
…remental-update # Conflicts: # docs/develop-guides/changelog.md
变更描述
优化思维导图构建的接口设计,支持增量构建和更新,减少 AI 调用成本。
变更内容
后端
GET /databases/{kb_id}/mindmap/diff接口:检测思维导图与知识库文件的变更差异(新增/删除),纯计算无 AI 调用POST /databases/{kb_id}/mindmap/generate新增incremental参数:支持增量更新模式detect_mindmap_changes()— 对比导图追踪的文件与当前文件,返回变更信息remove_files_from_mindmap()— 递归树手术,移除指定文件名的叶子节点(纯删除场景无需 AI)update_mindmap_incremental()— 增量更新入口,纯删除走树手术,有新增时调用 AI 整合KnowledgeBase模型新增mindmap_file_ids(记录导图追踪的文件映射)和mindmap_metadata(记录生成元信息)字段前端
变更类型
测试
验证方式
说明
旧思维导图中残留的失效引用,可通过"重新生成"按钮清理。