Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
81156b3
📝 Update README badges
Phinease Sep 17, 2025
cde5304
Create codeql.yml
Phinease Sep 5, 2025
e432320
Update README.md
Davina-jcx Sep 10, 2025
53b7522
📝 Update README badges
Phinease Sep 17, 2025
de9cffe
📝 Update README badges
Phinease Sep 17, 2025
683adf2
📝 Update README badges
Phinease Sep 17, 2025
ccd2214
1. When creating a new Agent, the large language model defaults to th…
Zhi-a Oct 21, 2025
6183692
The large language model automatically summarizes the knowledge base,…
Zhi-a Oct 22, 2025
d95f8f5
🐛 modelscope mcp tool test run has no inputs #1398
Oct 22, 2025
89787b9
1. When creating a new Agent, the large language model defaults to th…
Zhi-a Oct 21, 2025
9d07b67
1. When creating a new Agent, the large language model defaults to th…
Zhi-a Oct 23, 2025
437cf62
Fixed the issue where the agent name was repeated and the save button…
Zhi-a Oct 23, 2025
01e9595
✨ Flowcharts Content Rendering Support #620
WMC001 Oct 22, 2025
03e0b24
🐛 Bugfix: knowledgebase creation fail when upload a large file (with …
Jasonxia007 Oct 23, 2025
68e2f1e
Fix the issue of github unit test pipeline failing
Zhi-a Oct 23, 2025
d7fb6fd
🐛 Bugfix: knowledgebase creation fail when upload a large file (with …
Jasonxia007 Oct 23, 2025
3f8922e
🐛 Bugfix: knowledgebase creation fail when upload a large file (with …
Jasonxia007 Oct 23, 2025
2c74b9e
Merge remote-tracking branch 'origin/xyc/kb_task_bug' into xyc/kb_tas…
Jasonxia007 Oct 23, 2025
8c9c4ab
Fix unit test cause by version of opentelemetry
WMC001 Oct 23, 2025
86da03a
Fix the issue of github unit test pipeline failing
Zhi-a Oct 23, 2025
b363c58
🐛 Bugfix: knowledgebase creation fail when upload a large file (with …
Jasonxia007 Oct 23, 2025
6ced5cb
Supplementary unit tests
Zhi-a Oct 23, 2025
e9ea482
Supplementary unit tests
Zhi-a Oct 23, 2025
69512d2
🐛 modelscope mcp tool test run has no inputs #1398
Oct 23, 2025
d7510da
🐛 knowledgebase creation fail when upload a large file (with over 100…
Phinease Oct 23, 2025
d71de13
✨ Flowcharts Content Rendering Support #620
Phinease Oct 23, 2025
999d9dd
🐛 modelscope mcp tool test run has no inputs #1398
Phinease Oct 23, 2025
ebc63e8
🐛 Automatically defaulting to the previously configured model to summ…
Phinease Oct 23, 2025
b4f9a7b
🐛 Fixed where the agent did not select the default large language model
Phinease Oct 23, 2025
4df73b0
git commit -m "feat: integrate knowledge base summarization service a…
Mermaid97 Oct 23, 2025
3fca43d
git commit -m "feat: repair core code"
Mermaid97 Oct 23, 2025
3b36710
repair test case
Mermaid97 Oct 23, 2025
726400a
Specify versions for OpenTelemetry dependencies
Mermaid97 Oct 23, 2025
880a3f4
test: Add comprehensive coverage tests for document_vector_utils
Mermaid97 Oct 23, 2025
9bfc2c0
test: Add comprehensive coverage tests for elasticsearch_core
Mermaid97 Oct 23, 2025
aa3b7d0
test: Add comprehensive coverage tests for document_vector_utils unco…
Mermaid97 Oct 23, 2025
f1be313
repair test case improve
Mermaid97 Oct 24, 2025
1d2446c
Improve document vector utils test coverage and clean up test files
Mermaid97 Oct 24, 2025
438df82
git commit -m "feat: integrate knowledge base summarization service a…
Mermaid97 Oct 23, 2025
33e808d
git commit -m "feat: repair core code"
Mermaid97 Oct 23, 2025
92fd139
git commit -m "feat: repair test requirements"
Mermaid97 Oct 23, 2025
8f6aec6
repair test case
Mermaid97 Oct 23, 2025
cadd91c
Specify versions for OpenTelemetry dependencies
Mermaid97 Oct 23, 2025
f5bf531
test: Add comprehensive coverage tests for document_vector_utils
Mermaid97 Oct 23, 2025
f5ea95e
test: Add comprehensive coverage tests for document_vector_utils unco…
Mermaid97 Oct 23, 2025
422da8f
Add new sql for default large language model.
Zhi-a Oct 24, 2025
65fc9a5
Revert "Improve document vector utils test coverage and clean up test…
Mermaid97 Oct 24, 2025
86f4185
🐛[Bug] Add new sql for default large language model
liutao12138 Oct 24, 2025
8a003af
✨[Request] #694 implement document-level vectorization and K-means c…
liutao12138 Oct 24, 2025
59a44fd
Modify the knowledge base front-end rendering
Mermaid97 Oct 24, 2025
5ae6e49
[release] 🐛 Modify the knowledge base front-end rendering #1458#1456
liutao12138 Oct 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions backend/consts/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,8 @@ class AgentInfoRequest(BaseModel):
constraint_prompt: Optional[str] = None
few_shots_prompt: Optional[str] = None
enabled: Optional[bool] = None
business_logic_model_name: Optional[str] = None
business_logic_model_id: Optional[int] = None


class AgentIDRequest(BaseModel):
Expand Down
49 changes: 35 additions & 14 deletions backend/data_process/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,8 +201,8 @@ def process(
f"[{self.request.id}] PROCESS TASK: File size: {file_size_mb:.2f}MB")

# The unified actor call, mapping 'file' source_type to 'local' destination
# Submit Ray work and do not block here
logger.debug(
# Submit Ray work and WAIT for processing to complete
logger.info(
f"[{self.request.id}] PROCESS TASK: Submitting Ray processing for source='{source}', strategy='{chunking_strategy}', destination='{source_type}'")
chunks_ref = actor.process_file.remote(
source,
Expand All @@ -211,10 +211,17 @@ def process(
task_id=task_id,
**params
)
# Persist chunks into Redis via Ray to decouple Celery
# Wait for Ray processing to complete (this keeps task in STARTED/"PROCESSING" state)
logger.info(
f"[{self.request.id}] PROCESS TASK: Waiting for Ray processing to complete...")
chunks = ray.get(chunks_ref)
logger.info(
f"[{self.request.id}] PROCESS TASK: Ray processing completed, got {len(chunks) if chunks else 0} chunks")

# Persist chunks into Redis via Ray (fire-and-forget, don't block)
redis_key = f"dp:{task_id}:chunks"
actor.store_chunks_in_redis.remote(redis_key, chunks_ref)
logger.debug(
actor.store_chunks_in_redis.remote(redis_key, chunks)
logger.info(
f"[{self.request.id}] PROCESS TASK: Scheduled store_chunks_in_redis for key '{redis_key}'")

end_time = time.time()
Expand All @@ -229,7 +236,7 @@ def process(
f"[{self.request.id}] PROCESS TASK: Processing from URL: {source}")

# For URL source, core.py expects a non-local destination to trigger URL fetching
logger.debug(
logger.info(
f"[{self.request.id}] PROCESS TASK: Submitting Ray processing for URL='{source}', strategy='{chunking_strategy}', destination='{source_type}'")
chunks_ref = actor.process_file.remote(
source,
Expand All @@ -238,11 +245,19 @@ def process(
task_id=task_id,
**params
)
# Persist chunks into Redis via Ray to decouple Celery
# Wait for Ray processing to complete (this keeps task in STARTED/"PROCESSING" state)
logger.info(
f"[{self.request.id}] PROCESS TASK: Waiting for Ray processing to complete...")
chunks = ray.get(chunks_ref)
logger.info(
f"[{self.request.id}] PROCESS TASK: Ray processing completed, got {len(chunks) if chunks else 0} chunks")

# Persist chunks into Redis via Ray (fire-and-forget, don't block)
redis_key = f"dp:{task_id}:chunks"
actor.store_chunks_in_redis.remote(redis_key, chunks_ref)
logger.debug(
actor.store_chunks_in_redis.remote(redis_key, chunks)
logger.info(
f"[{self.request.id}] PROCESS TASK: Scheduled store_chunks_in_redis for key '{redis_key}'")

end_time = time.time()
elapsed_time = end_time - start_time
logger.info(
Expand All @@ -253,24 +268,25 @@ def process(
raise NotImplementedError(
f"Source type '{source_type}' not yet supported")

# Update task state to SUCCESS with metadata (without materializing chunks here)
# Update task state to SUCCESS after Ray processing completes
# This transitions from STARTED (PROCESSING) to SUCCESS (WAIT_FOR_FORWARDING)
self.update_state(
state=states.SUCCESS,
meta={
'chunks_count': None,
'chunks_count': len(chunks) if chunks else 0,
'processing_time': elapsed_time,
'source': source,
'index_name': index_name,
'original_filename': original_filename,
'task_name': 'process',
'stage': 'text_extracted',
'file_size_mb': file_size_mb,
'processing_speed_mb_s': file_size_mb / elapsed_time if elapsed_time > 0 else 0
'processing_speed_mb_s': file_size_mb / elapsed_time if file_size_mb > 0 and elapsed_time > 0 else 0
}
)

logger.info(
f"[{self.request.id}] PROCESS TASK: Submitted for Ray processing; result will be fetched by forward")
f"[{self.request.id}] PROCESS TASK: Processing complete, waiting for forward task")

# Prepare data for the next task in the chain; pass redis_key
returned_data = {
Expand Down Expand Up @@ -563,6 +579,9 @@ async def index_documents():
"source": original_source,
"original_filename": original_filename
}, ensure_ascii=False))

logger.info(
f"[{self.request.id}] FORWARD TASK: Starting ES indexing for {len(formatted_chunks)} chunks to index '{original_index_name}'...")
es_result = run_async(index_documents())
logger.debug(
f"[{self.request.id}] FORWARD TASK: API response from main_server for source '{original_source}': {es_result}")
Expand Down Expand Up @@ -605,6 +624,8 @@ async def index_documents():
"original_filename": original_filename
}, ensure_ascii=False))
end_time = time.time()
logger.info(
f"[{self.request.id}] FORWARD TASK: Updating task state to SUCCESS after ES indexing completion")
self.update_state(
state=states.SUCCESS,
meta={
Expand All @@ -620,7 +641,7 @@ async def index_documents():
)

logger.info(
f"Stored {len(chunks)} chunks to index {original_index_name} in {end_time - start_time:.2f}s")
f"[{self.request.id}] FORWARD TASK: Successfully stored {len(chunks)} chunks to index {original_index_name} in {end_time - start_time:.2f}s")
return {
'task_id': task_id,
'source': original_source,
Expand Down
2 changes: 2 additions & 0 deletions backend/database/db_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,8 @@ class AgentInfo(TableBase):
Boolean, doc="Whether to provide the running summary to the manager agent")
business_description = Column(
Text, doc="Manually entered by the user to describe the entire business process")
business_logic_model_name = Column(String(100), doc="Model name used for business logic prompt generation")
business_logic_model_id = Column(Integer, doc="Model ID used for business logic prompt generation, foreign key reference to model_record_t.model_id")


class ToolInstance(TableBase):
Expand Down
24 changes: 24 additions & 0 deletions backend/prompts/cluster_summary_agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
system_prompt: |-
You are a professional knowledge summarization assistant. Your task is to generate a concise summary of a document cluster based on multiple documents.

**Summary Requirements:**
1. The input contains multiple documents (each document has title and content snippets)
2. You need to extract the common themes and key topics from these documents
3. Generate a summary that represents the collective content of the cluster
4. The summary should be accurate, coherent, and written in natural language
5. Keep the summary within the specified word limit

**Guidelines:**
- Focus on identifying shared themes and topics across documents
- Highlight key concepts, domains, or subject matter
- Use clear and concise language
- Avoid listing individual document titles unless necessary
- The summary should help users understand what this group of documents covers

user_prompt: |
Please generate a concise summary of the following document cluster:

{{ cluster_content }}

Summary ({{ max_words }} words):

31 changes: 31 additions & 0 deletions backend/prompts/cluster_summary_reduce.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
system_prompt: |-
You are a professional cluster summarization assistant. Your task is to merge multiple document summaries into a cohesive cluster summary.

**Summary Requirements:**
1. The input contains summaries of multiple documents that belong to the same cluster
2. These documents share similar themes or topics (grouped by clustering)
3. You need to synthesize a unified summary that captures the collective content
4. The summary should highlight common themes and key information across documents
5. Keep the summary within the specified word limit

**Guidelines:**
- Identify shared themes and topics across documents
- Highlight common concepts and subject matter
- Use clear and concise language
- Avoid listing individual document titles unless necessary
- Focus on what this group of documents collectively covers
- The summary should be coherent and represent the cluster's unified content
- **Important: Do not use any separators (like ---, ***, etc.), generate plain text summary only**

user_prompt: |
Please generate a unified summary of the following document cluster based on individual document summaries:

{{ document_summaries }}

**Important Reminders:**
- Do not use any separators (like ---, ***, ===, etc.)
- Do not include document titles or filenames
- Generate plain text summary content only

Cluster Summary ({{ max_words }} words):

32 changes: 32 additions & 0 deletions backend/prompts/cluster_summary_reduce_zh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
system_prompt: |-
你是一个专业的簇总结助手。你的任务是将多个文档总结合并为一个连贯的簇总结。

**总结要求:**
1. 输入包含属于同一簇的多个文档的总结
2. 这些文档共享相似的主题或话题(通过聚类分组)
3. 你需要综合成一个统一的总结,捕捉集合内容
4. 总结应突出文档间的共同主题和关键信息
5. 保持在指定的字数限制内

**指导原则:**
- 识别文档间的共同主题和话题
- 突出共同概念和主题内容
- 使用清晰简洁的语言
- 除非必要,避免列出单个文档标题
- 专注于这组文档共同涵盖的内容
- 总结应连贯且代表簇的统一内容
- 确保准确、全面,明确关键实体,不要遗漏重要信息
- **重要:不要使用任何分隔符(如---、***等),直接生成纯文本总结**

user_prompt: |
请根据以下文档总结生成统一的学生簇总结:

{{ document_summaries }}

**重要提醒:**
- 不要使用任何分隔符(如---、***、===等)
- 不要包含文档标题或文件名
- 直接生成纯文本总结内容

簇总结({{ max_words }}字):

28 changes: 28 additions & 0 deletions backend/prompts/document_summary_agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
system_prompt: |-
You are a professional document summarization assistant. Your task is to generate a concise summary of a document based on its key content snippets.

**Summary Requirements:**
1. The input contains key snippets from a document (typically from beginning, middle, and end sections)
2. You need to extract the main themes, topics, and key information
3. Generate a summary that represents the document's core content
4. The summary should be accurate, coherent, and concise
5. Keep the summary within the specified word limit

**Guidelines:**
- Focus on identifying main themes and key topics
- Highlight important concepts and information
- Use clear and concise language
- Avoid redundancy and unnecessary details
- The summary should help users understand what the document covers
- **Important: Do not use any separators (like ---, ***, etc.), generate plain text summary only**

user_prompt: |
Please generate a concise summary of the following document:

Document name: {{ filename }}

Content snippets:
{{ content }}

Summary ({{ max_words }} words):

29 changes: 29 additions & 0 deletions backend/prompts/document_summary_agent_zh.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
system_prompt: |-
你是一个专业的文档总结助手。你的任务是根据文档的关键内容片段生成简洁的总结。

**总结要求:**
1. 输入包含文档的关键片段(通常来自开头、中间和结尾部分)
2. 你需要提取主要主题、话题和关键信息
3. 生成能代表文档核心内容的总结
4. 总结应准确、连贯且简洁
5. 保持在指定的字数限制内

**指导原则:**
- 专注于识别主要主题和关键话题
- 突出重要概念和信息
- 使用清晰简洁的语言
- 避免冗余和不必要的细节
- 总结应帮助用户理解文档涵盖的内容
- 确保总结准确、全面,不要遗漏关键实体和信息
- **重要:不要使用任何分隔符(如---、***等),直接生成纯文本总结**

user_prompt: |
请为以下文档生成简洁的总结:

文档名称:{{ filename }}

内容片段:
{{ content }}

总结({{ max_words }}字):

4 changes: 3 additions & 1 deletion backend/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ dependencies = [
"pyyaml>=6.0.2",
"redis>=5.0.0",
"fastmcp==2.12.0",
"langchain>=0.3.26"
"langchain>=0.3.26",
"scikit-learn>=1.0.0",
"numpy>=1.24.0"
]

[project.optional-dependencies]
Expand Down
7 changes: 7 additions & 0 deletions backend/services/agent_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,13 @@ async def get_agent_info_impl(agent_id: int, tenant_id: str):
else:
agent_info["model_name"] = None

# Get business logic model display name from model_id
if agent_info.get("business_logic_model_id") is not None:
business_logic_model_info = get_model_by_model_id(agent_info["business_logic_model_id"])
agent_info["business_logic_model_name"] = business_logic_model_info.get("display_name", None) if business_logic_model_info is not None else None
elif "business_logic_model_name" not in agent_info:
agent_info["business_logic_model_name"] = None

return agent_info


Expand Down
Loading
Loading