Refactor KG builder #52

ChenZiHong-Gavin · 2025-09-25T09:22:11Z

This PR refactors the KG (Knowledge Graph) builder by restructuring the module organization and improving the asynchronous method handling.

WIP: implement functions in kg_builder & replace functions in operators.

Copilot

Pull Request Overview

This PR refactors the KG (Knowledge Graph) builder by restructuring the module organization and improving the asynchronous method handling. The refactoring moves functionality from models to dedicated operator modules and introduces a decorator pattern for async-to-sync method conversion.

Moves text splitting and file reading logic from models to dedicated operators
Refactors the main GraphGen class to use a decorator for async method synchronization
Reorganizes imports and module structure for better separation of concerns

Reviewed Changes

Copilot reviewed 17 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
graphgen/utils/wrap.py	Introduces async_to_sync_method decorator for method conversion
graphgen/operators/split/split_chunks.py	Moves chunking logic from models to operators with async support
graphgen/operators/read/read_files.py	Moves file reading logic from models to operators
graphgen/graphgen.py	Refactors main class to use decorator pattern and simplified async methods
graphgen/models/splitter/init.py	Removes chunking logic moved to operators
graphgen/models/reader/init.py	Removes file reading logic moved to operators
Multiple config files	Reorders configuration fields without functional changes

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

graphgen/operators/split/split_chunks.py

graphgen/graphgen.py

Co-authored-by: Copilot <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 17 out of 20 changed files in this pull request and generated 1 comment.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-25T09:28:07Z

graphgen/graphgen.py

+    @async_to_sync_method
+    async def generate_reasoning(self, method_params):


The generate_reasoning method signature is inconsistent with other methods in the class. The method_params parameter should have a type annotation for clarity and consistency.

…to kg_builder

Copilot

Pull Request Overview

Copilot reviewed 16 out of 19 changed files in this pull request and generated 4 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-28T08:14:10Z

graphgen/utils/run_concurrent.py

+    results = await tqdm_async.gather(*tasks, desc=desc, unit=unit)
+
+    ok_results = []
+    for idx, res in enumerate(results):
+        if isinstance(res, Exception):
+            logger.exception("Task failed: %s", res)
+            if progress_bar:
+                progress_bar((idx + 1) / len(items), desc=desc)
+            continue
+        ok_results.append(res)


The tqdm_async.gather function doesn't exist in the tqdm library. Use asyncio.gather(*tasks) and wrap it with tqdm_async for progress tracking instead.

Suggested change

results = await tqdm_async.gather(*tasks, desc=desc, unit=unit)

ok_results = []

for idx, res in enumerate(results):

if isinstance(res, Exception):

logger.exception("Task failed: %s", res)

if progress_bar:

progress_bar((idx + 1) / len(items), desc=desc)

continue

ok_results.append(res)

results = []

for idx, task in enumerate(tqdm_async(tasks, desc=desc, unit=unit)):

try:

res = await task

except Exception as e:

logger.exception("Task failed: %s", e)

res = e

results.append(res)

Copilot · 2025-09-28T08:14:10Z

graphgen/utils/run_concurrent.py

+    for idx, res in enumerate(results):
+        if isinstance(res, Exception):
+            logger.exception("Task failed: %s", res)


The asyncio.gather function raises exceptions rather than returning them in the results list. Exceptions should be caught during task creation or execution, not checked in the results.

Copilot · 2025-09-28T08:14:10Z

graphgen/operators/split/split_chunks.py

+    async for doc_key, doc in tqdm_async(
+        new_docs.items(), desc="[1/4]Chunking documents", unit="doc"
+    ):


The tqdm_async cannot be used with dictionary .items() as it's not an async iterable. Use regular tqdm or convert to an async iterable first.

Copilot · 2025-09-28T08:14:11Z

graphgen/bases/base_kg_builder.py

+    def extract(self, chunk: Chunk) -> None:
+        pass
+
+    # 摘要


Comment contains Chinese characters. Use English for consistency with the rest of the codebase.

Suggested change

# 摘要

# Condense

Copilot

Pull Request Overview

Copilot reviewed 40 out of 44 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-28T11:35:15Z

graphgen/operators/build_kg/extract_kg.py

+    progress_bar: gr.Progress = None,
+):


The max_concurrent parameter was removed from the function signature but is still referenced in the documentation comment. The docstring should be updated to reflect the current parameters.

Copilot · 2025-09-28T11:35:15Z

graphgen/models/llm/openai_client.py

        kwargs = self._pre_generate(text, history)
-        kwargs["temperature"] = temperature

        prompt_tokens = 0


The code accesses self.tokenizer but the tokenizer is passed as a parameter to the constructor and may be None. This will cause an AttributeError if no tokenizer is provided.

Suggested change

prompt_tokens = 0

prompt_tokens = 0

if self.tokenizer is None:

raise ValueError("A tokenizer must be provided to use generate_answer.")

graphgen/bases/base_llm_client.py

Co-authored-by: Copilot <[email protected]>

…to kg_builder

ChenZiHong-Gavin added 3 commits September 25, 2025 16:36

refactor: use async_to_sync_method

4188441

delete some code

a2b9459

refactor: refact split_chunks

1cafc02

ChenZiHong-Gavin requested a review from Copilot September 25, 2025 09:22

Copilot AI reviewed Sep 25, 2025

View reviewed changes

graphgen/operators/split/split_chunks.py Show resolved Hide resolved

graphgen/graphgen.py Outdated Show resolved Hide resolved

ChenZiHong-Gavin and others added 2 commits September 25, 2025 17:23

Update graphgen/graphgen.py

e18b947

Co-authored-by: Copilot <[email protected]>

fix: fix type annotation

6d3bdbd

ChenZiHong-Gavin requested a review from Copilot September 25, 2025 09:27

Copilot AI reviewed Sep 25, 2025

View reviewed changes

ChenZiHong-Gavin added 3 commits September 26, 2025 16:03

wip: add base_kg_builder

9a56e30

Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…

051dc77

…to kg_builder

refactor: abstract run_concurrent & delete semaphore

8fd34b2

ChenZiHong-Gavin requested a review from Copilot September 28, 2025 08:13

Copilot AI reviewed Sep 28, 2025

View reviewed changes

refactor: refact llm_client & tokenizer

e42bcb6

ChenZiHong-Gavin requested a review from Copilot September 28, 2025 11:34

Copilot AI reviewed Sep 28, 2025

View reviewed changes

ChenZiHong-Gavin and others added 3 commits September 28, 2025 19:38

Update graphgen/bases/base_llm_client.py

9b7ef17

Co-authored-by: Copilot <[email protected]>

Merge branch 'main' of https://github.com/open-sciencelab/GraphGen in…

128d2f8

…to kg_builder

wip: add NetworkXKGBuilder

b30f5a1

ChenZiHong-Gavin marked this pull request as ready for review September 30, 2025 03:27

ChenZiHong-Gavin merged commit 6447f2a into main Sep 30, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor KG builder #52

Refactor KG builder #52

Uh oh!

ChenZiHong-Gavin commented Sep 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 28, 2025

Uh oh!

Copilot AI Sep 28, 2025

Uh oh!

Copilot AI Sep 28, 2025

Uh oh!

Copilot AI Sep 28, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 28, 2025

Uh oh!

Copilot AI Sep 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@async_to_sync_method
		async def generate_reasoning(self, method_params):

Refactor KG builder #52

Refactor KG builder #52

Uh oh!

Conversation

ChenZiHong-Gavin commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChenZiHong-Gavin commented Sep 25, 2025 •

edited

Loading