Hotfix/benchmarking redesign by javidsegura · Pull Request #90 · stride-research/flowgentic

javidsegura · 2026-01-19T22:32:00Z

initial set of plots for laptop config for strong scaling only

See the plots at: "results/FIRST RUN/plots

Contains:

New data generation script (framework agnostic, integrates new agentic abstractions)
Configuration file for easy modification of experiment variables
Analyze module that plots for efficiency and speedup
General modular refactoring of the benchmarking dir

Merge branch 'main' of https://github.com/stride-research/flowgentic into feat/abstractions_redesign

…Going for asyncflow integration

coderabbitai · 2026-01-19T22:32:06Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e8d65bb7-90af-406a-b5f1-08fac3cfaf4b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hotfix/benchmarking_redesign

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

gemini-code-assist · 2026-01-19T22:32:49Z

Summary of Changes

Hello @javidsegura, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant redesign of the project's agent architecture and establishes a robust benchmarking framework. The core changes involve creating a more modular and extensible agent system by abstracting its reasoning, memory, and execution components. A key part of this is the integration of 'radical.asyncflow' as a high-performance backend for concurrent tool execution. Concurrently, a new benchmarking suite has been added to systematically measure and analyze the performance characteristics of the agent, particularly focusing on strong scaling, with initial results already generated. This refactor aims to improve the maintainability, testability, and performance analysis capabilities of the agent system.

Highlights

New Agent Core Architecture: Introduced a modular 'Agent' class with distinct 'Reasoner', 'ModelProvider', 'MemoryManager', and 'BaseEngine' components for improved clarity and extensibility in agent design.
Radical AsyncFlow Integration: Implemented 'AsyncFlowEngine' to leverage 'radical.asyncflow' for efficient, concurrent execution of agent tools, abstracting the backend logic and enabling high-performance operations.
Comprehensive Benchmarking Framework: Added a new 'tests/benchmark' directory with tools for running strong scaling experiments, collecting performance data, and generating speedup/efficiency plots to systematically analyze agent performance.
Major Code Refactoring and Cleanup: Old 'langGraph' and 'settings' related modules have been moved to an 'old/' directory, signifying a shift towards the new 'core' architecture and streamlining the codebase.
Dependency Management Update: Switched to 'uv' for virtual environment and package management, and updated 'pyproject.toml' to include new dependencies like 'matplotlib' and 'cloudpickle' for benchmarking and data visualization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant redesign, moving old code to an old directory and establishing a new, more abstract core for agents, reasoners, and backend engines. A comprehensive benchmarking suite has also been added to test scaling, which is a great addition. My review focuses on improving type consistency, fixing a few bugs, and offering suggestions to enhance maintainability and configuration.

I am having trouble creating individual review comments. Click here to see my feedback.

src/flowgentic/backend_engines/base.py (8)

There is a type mismatch for the tools_to_use parameter between the base class and its implementation. Here in BaseEngine, it's defined as List[Tuple[str, Dict]], but in AsyncFlowEngine, it's List[ChatCompletionMessageFunctionToolCall]. This violates the Liskov Substitution Principle and can lead to type-related errors. Please unify the type hint across the base class and its implementations.

src/flowgentic/core/agent.py (37)

A PromptInput object is being passed to state.add_user_input(), but this method expects a str. This will raise a TypeError at runtime. You should pass the user_input attribute of the prompt_input object instead.

state.add_user_input(prompt_input.user_input)

.gitignore (44-48)

Generated files, such as benchmark results, plots, and temporary files from examples, should be ignored by git. The current changes remove im-working.txt from being ignored, which is created by an example. Additionally, the new benchmark results under tests/benchmark/results/ are being tracked.

I suggest adding these paths to your .gitignore to keep the repository clean from generated artifacts.

Makefile (42-62)

Several useful developer commands such as format, lint, and tests have been removed from the Makefile. While the installation process has been updated to use uv, removing these commands might hinder development workflow and CI/CD processes. Was this removal intentional? If not, I recommend re-adding them.

examples/langgraph-integration/design_patterns/chatbot/toy.py (75)

There is a typo in the function name deterministic_task_intetrnal. It should be deterministic_task_internal.

async def deterministic_task_internal(state: WorkflowState):

examples/langgraph_asyncflow/main.py (48)

There's a typo in the model_id. It should be dummy_model instead of dummy_moodel.

model_id="dummy/dummy_model",

pyproject.toml (98)

For better reproducibility, it's recommended to pin git dependencies to a specific commit hash or tag instead of a branch like main. This ensures that everyone uses the exact same version of the dependency, avoiding potential issues from future changes on the branch.

radical-asyncflow = { git = "https://github.com/radical-cybertools/radical.asyncflow.git", rev = "<COMMIT_HASH_OR_TAG>" }

src/flowgentic/init.py (5)

The logger level is hardcoded to "debug". While this is useful for development, it's not ideal for production or other environments. It would be better to make this configurable, for example, by reading from an environment variable. This would allow changing the log level without modifying the code.

colorful_output=True, logger_level=os.getenv("LOG_LEVEL", "debug")

src/flowgentic/core/models/model_provider.py (3)

This import statement is a duplicate of the one on line 1. It should be removed.

src/flowgentic/core/tool/tool.py (21)

The get_schema method does a good job for simple types. However, it doesn't fully support complex types like List[T]. For a parameter annotated as List[str], it will generate {"type": "array"} but will miss the "items": {"type": "string"} part, which is important for schema validation and for the LLM to understand the expected structure. Consider enhancing this method to handle nested types.

tests/benchmark/data_generation/workload/manager.py (30)

A print statement is used here for debugging. It's better practice to use the logging framework (logger.debug(...)) for such messages. This allows for consistent log formatting and the ability to control log verbosity through configuration.

logger.debug(f"NUMBER OF AGENTS {self.n_of_agents}")

tests/benchmark/data_generation/workload/langgraph_asyncflow.py (36)

The workload simulation uses ProcessPoolExecutor. For tasks that are I/O-bound, like the asyncio.sleep in your tool, ThreadPoolExecutor is generally more efficient. ProcessPoolExecutor incurs higher overhead due to inter-process communication and data serialization (pickling), which might not be necessary here and could skew benchmark results. If the goal is to simulate CPU-bound work, this is appropriate, but for I/O-bound tasks, consider switching to ThreadPoolExecutor.

tests/benchmark/data_generation/run_benchmark.py

yamirghofran · 2026-01-22T15:10:47Z

examples/old/langgraph-integration/design_patterns/chatbot/toy.py

@@ -72,7 +72,7 @@ async def traffic_extractor(city: str):
 		@agents_manager.execution_wrappers.asyncflow(
 			flow_type=AsyncFlowType.EXECUTION_BLOCK
 		)
-		async def deterministic_task_internal(state: WorkflowState):
+		async def deterministic_task_intetrnal(state: WorkflowState):


is this meant to be like this? looks like a typo.

…h smenatics

This reverts commit d75fa2d.

Remove all results/ files from git index (git rm --cached) so they are no longer tracked. Broaden the gitignore rule from **/experiments/** to tests/benchmark/results/ to cover the full output directory. Made-with: Cursor

javidsegura added 6 commits December 18, 2025 13:11

feat: initial report

4289a01

asas

448f052

Merge branch 'main' of https://github.com/stride-research/flowgentic into feat/abstractions_redesign

refactor: eliminated unnecesary files, added uv for installation

48d4a22

feat/ agent abstarction is functional. Memory is not yet integrated. …

da70906

…Going for asyncflow integration

feat: added final example with asyncflow running in parallel

37c8a17

initial set of plots for laptop config for strong scaling only

fac178b

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

javidsegura added 4 commits January 20, 2026 21:56

feat: added weak scaling plots

243d962

feat: added shared backend workloads

c2243ea

refactor: rerun with onger experiment config

303efbc

feat: added service tool

c7a5379

yamirghofran reviewed Jan 21, 2026

View reviewed changes

tests/benchmark/data_generation/run_benchmark.py Outdated Show resolved Hide resolved

yamirghofran reviewed Jan 21, 2026

View reviewed changes

tests/benchmark/data_generation/run_benchmark.py Outdated Show resolved Hide resolved

yamirghofran reviewed Jan 22, 2026

View reviewed changes

javidsegura added 14 commits January 26, 2026 13:55

feat: added support for autogen and parsl

384f088

feat: added support for autogen and parsl

5c0b40e

feat: added support for autogen and parsl

1ec100e

merge commit

8c178d3

addede bnechark outline

d052133

doc: updated outline with assingned tasks

c173aea

feat: added stable API for benchmarking module

c69a335

feat: finished experiment 2 for makespan

51200c8

feat: added hook for capturing flowgentic overhead events

9a950bb

feat: chane wrappers names to represent hpc semantics and not langrap…

4b19296

…h smenatics

feat: added plots for throughput and overhead

56938ee

feat: added script to run multiple config benchmark suite

24d69e7

feat: resvoled ayncflow and memory explosion error

6aaca1c

added async context manager

5f68e83

javidsegura and others added 28 commits February 9, 2026 11:39

added fix for 2x execution

ac0cb54

hotfix: fixed weak scaling ratio

625187d

hotfix: fixed weak scaling ratio

c4b51fc

removede linux deadlock

5e26387

cluster updates

5773adc

feat: added lower bound for backend slots

ee12567

feat: added updated weak scaling ratio and workload to be 2**17

1a270af

feat: added discord set-up

013176c

feat: added full discord communication

e545904

discord

b30e4ef

feat: added cluster sripts

d75fa2d

Revert "feat: added cluster sripts"

32e2684

This reverts commit d75fa2d.

removed old directories as per indicated by matteo

4cedf44

Ignore tests from the previous version

5201eeb

Fix deprecated json module import

6e9ecbd

Fix logging race condition

f817eea

Extend event model

cce4daa

Implement event model in Parsl connector

962a729

Test tool lifecycle

68f5f24

Fix bug in event ordering

14515e6

Remove bookkeeping duration

4c1ec56

Add testing for lifecycle durations

2bd6298

Fix linting

fd4d399

Split dependencies

b3ce0c5

Lazy dependencies, am I reimplementing pip?!

1db7a7f

refactor: migrated to new event model

680862d

Removed heavy experiment result from being tracked by git

5e6f451

Stop tracking benchmark results and fix gitignore scope

1fb6eff

Remove all results/ files from git index (git rm --cached) so they are no longer tracked. Broaden the gitignore rule from **/experiments/** to tests/benchmark/results/ to cover the full output directory. Made-with: Cursor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hotfix/benchmarking redesign#90

Hotfix/benchmarking redesign#90
javidsegura wants to merge 52 commits intomainfrom
hotfix/benchmarking_redesign

javidsegura commented Jan 19, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 19, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist bot commented Jan 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

yamirghofran Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

javidsegura commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist bot commented Jan 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

src/flowgentic/backend_engines/base.py (8)

src/flowgentic/core/agent.py (37)

.gitignore (44-48)

Makefile (42-62)

examples/langgraph-integration/design_patterns/chatbot/toy.py (75)

examples/langgraph_asyncflow/main.py (48)

pyproject.toml (98)

src/flowgentic/init.py (5)

src/flowgentic/core/models/model_provider.py (3)

src/flowgentic/core/tool/tool.py (21)

tests/benchmark/data_generation/workload/manager.py (30)

tests/benchmark/data_generation/workload/langgraph_asyncflow.py (36)

Uh oh!

Uh oh!

Uh oh!

yamirghofran Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

javidsegura commented Jan 19, 2026 •

edited

Loading

coderabbitai bot commented Jan 19, 2026 •

edited

Loading