-
Notifications
You must be signed in to change notification settings - Fork 30
Refactor KG builder #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the KG (Knowledge Graph) builder by restructuring the module organization and improving the asynchronous method handling. The refactoring moves functionality from models to dedicated operator modules and introduces a decorator pattern for async-to-sync method conversion.
- Moves text splitting and file reading logic from models to dedicated operators
- Refactors the main GraphGen class to use a decorator for async method synchronization
- Reorganizes imports and module structure for better separation of concerns
Reviewed Changes
Copilot reviewed 17 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
graphgen/utils/wrap.py | Introduces async_to_sync_method decorator for method conversion |
graphgen/operators/split/split_chunks.py | Moves chunking logic from models to operators with async support |
graphgen/operators/read/read_files.py | Moves file reading logic from models to operators |
graphgen/graphgen.py | Refactors main class to use decorator pattern and simplified async methods |
graphgen/models/splitter/init.py | Removes chunking logic moved to operators |
graphgen/models/reader/init.py | Removes file reading logic moved to operators |
Multiple config files | Reorders configuration fields without functional changes |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 17 out of 20 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
@async_to_sync_method | ||
async def generate_reasoning(self, method_params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generate_reasoning
method signature is inconsistent with other methods in the class. The method_params
parameter should have a type annotation for clarity and consistency.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 16 out of 19 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
results = await tqdm_async.gather(*tasks, desc=desc, unit=unit) | ||
|
||
ok_results = [] | ||
for idx, res in enumerate(results): | ||
if isinstance(res, Exception): | ||
logger.exception("Task failed: %s", res) | ||
if progress_bar: | ||
progress_bar((idx + 1) / len(items), desc=desc) | ||
continue | ||
ok_results.append(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tqdm_async.gather
function doesn't exist in the tqdm library. Use asyncio.gather(*tasks)
and wrap it with tqdm_async
for progress tracking instead.
results = await tqdm_async.gather(*tasks, desc=desc, unit=unit) | |
ok_results = [] | |
for idx, res in enumerate(results): | |
if isinstance(res, Exception): | |
logger.exception("Task failed: %s", res) | |
if progress_bar: | |
progress_bar((idx + 1) / len(items), desc=desc) | |
continue | |
ok_results.append(res) | |
results = [] | |
for idx, task in enumerate(tqdm_async(tasks, desc=desc, unit=unit)): | |
try: | |
res = await task | |
except Exception as e: | |
logger.exception("Task failed: %s", e) | |
res = e | |
results.append(res) |
Copilot uses AI. Check for mistakes.
for idx, res in enumerate(results): | ||
if isinstance(res, Exception): | ||
logger.exception("Task failed: %s", res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asyncio.gather
function raises exceptions rather than returning them in the results list. Exceptions should be caught during task creation or execution, not checked in the results.
Copilot uses AI. Check for mistakes.
async for doc_key, doc in tqdm_async( | ||
new_docs.items(), desc="[1/4]Chunking documents", unit="doc" | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tqdm_async
cannot be used with dictionary .items()
as it's not an async iterable. Use regular tqdm
or convert to an async iterable first.
Copilot uses AI. Check for mistakes.
graphgen/bases/base_kg_builder.py
Outdated
def extract(self, chunk: Chunk) -> None: | ||
pass | ||
|
||
# 摘要 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment contains Chinese characters. Use English for consistency with the rest of the codebase.
# 摘要 | |
# Condense |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 40 out of 44 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
progress_bar: gr.Progress = None, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max_concurrent
parameter was removed from the function signature but is still referenced in the documentation comment. The docstring should be updated to reflect the current parameters.
Copilot uses AI. Check for mistakes.
kwargs = self._pre_generate(text, history) | ||
kwargs["temperature"] = temperature | ||
|
||
prompt_tokens = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code accesses self.tokenizer
but the tokenizer is passed as a parameter to the constructor and may be None. This will cause an AttributeError if no tokenizer is provided.
prompt_tokens = 0 | |
prompt_tokens = 0 | |
if self.tokenizer is None: | |
raise ValueError("A tokenizer must be provided to use generate_answer.") |
Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <[email protected]>
This PR refactors the KG (Knowledge Graph) builder by restructuring the module organization and improving the asynchronous method handling.
WIP: implement functions in kg_builder & replace functions in operators.