Skip to content

Conversation

ChenZiHong-Gavin
Copy link
Collaborator

@ChenZiHong-Gavin ChenZiHong-Gavin commented Sep 25, 2025

This PR refactors the KG (Knowledge Graph) builder by restructuring the module organization and improving the asynchronous method handling.

WIP: implement functions in kg_builder & replace functions in operators.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the KG (Knowledge Graph) builder by restructuring the module organization and improving the asynchronous method handling. The refactoring moves functionality from models to dedicated operator modules and introduces a decorator pattern for async-to-sync method conversion.

  • Moves text splitting and file reading logic from models to dedicated operators
  • Refactors the main GraphGen class to use a decorator for async method synchronization
  • Reorganizes imports and module structure for better separation of concerns

Reviewed Changes

Copilot reviewed 17 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
graphgen/utils/wrap.py Introduces async_to_sync_method decorator for method conversion
graphgen/operators/split/split_chunks.py Moves chunking logic from models to operators with async support
graphgen/operators/read/read_files.py Moves file reading logic from models to operators
graphgen/graphgen.py Refactors main class to use decorator pattern and simplified async methods
graphgen/models/splitter/init.py Removes chunking logic moved to operators
graphgen/models/reader/init.py Removes file reading logic moved to operators
Multiple config files Reorders configuration fields without functional changes

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 17 out of 20 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +290 to +291
@async_to_sync_method
async def generate_reasoning(self, method_params):
Copy link
Preview

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generate_reasoning method signature is inconsistent with other methods in the class. The method_params parameter should have a type annotation for clarity and consistency.

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 16 out of 19 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +23 to +32
results = await tqdm_async.gather(*tasks, desc=desc, unit=unit)

ok_results = []
for idx, res in enumerate(results):
if isinstance(res, Exception):
logger.exception("Task failed: %s", res)
if progress_bar:
progress_bar((idx + 1) / len(items), desc=desc)
continue
ok_results.append(res)
Copy link
Preview

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tqdm_async.gather function doesn't exist in the tqdm library. Use asyncio.gather(*tasks) and wrap it with tqdm_async for progress tracking instead.

Suggested change
results = await tqdm_async.gather(*tasks, desc=desc, unit=unit)
ok_results = []
for idx, res in enumerate(results):
if isinstance(res, Exception):
logger.exception("Task failed: %s", res)
if progress_bar:
progress_bar((idx + 1) / len(items), desc=desc)
continue
ok_results.append(res)
results = []
for idx, task in enumerate(tqdm_async(tasks, desc=desc, unit=unit)):
try:
res = await task
except Exception as e:
logger.exception("Task failed: %s", e)
res = e
results.append(res)

Copilot uses AI. Check for mistakes.

Comment on lines +26 to +28
for idx, res in enumerate(results):
if isinstance(res, Exception):
logger.exception("Task failed: %s", res)
Copy link
Preview

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asyncio.gather function raises exceptions rather than returning them in the results list. Exceptions should be caught during task creation or execution, not checked in the results.

Copilot uses AI. Check for mistakes.

Comment on lines +48 to +50
async for doc_key, doc in tqdm_async(
new_docs.items(), desc="[1/4]Chunking documents", unit="doc"
):
Copy link
Preview

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tqdm_async cannot be used with dictionary .items() as it's not an async iterable. Use regular tqdm or convert to an async iterable first.

Copilot uses AI. Check for mistakes.

def extract(self, chunk: Chunk) -> None:
pass

# 摘要
Copy link
Preview

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment contains Chinese characters. Use English for consistency with the rest of the codebase.

Suggested change
# 摘要
# Condense

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 40 out of 44 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +29 to +30
progress_bar: gr.Progress = None,
):
Copy link
Preview

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max_concurrent parameter was removed from the function signature but is still referenced in the documentation comment. The docstring should be updated to reflect the current parameters.

Copilot uses AI. Check for mistakes.

kwargs = self._pre_generate(text, history)
kwargs["temperature"] = temperature

prompt_tokens = 0
Copy link
Preview

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code accesses self.tokenizer but the tokenizer is passed as a parameter to the constructor and may be None. This will cause an AttributeError if no tokenizer is provided.

Suggested change
prompt_tokens = 0
prompt_tokens = 0
if self.tokenizer is None:
raise ValueError("A tokenizer must be provided to use generate_answer.")

Copilot uses AI. Check for mistakes.

@ChenZiHong-Gavin ChenZiHong-Gavin marked this pull request as ready for review September 30, 2025 03:27
@ChenZiHong-Gavin ChenZiHong-Gavin merged commit 6447f2a into main Sep 30, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant