This implements the flux abstraction layer as used in RP by andre-merzky · Pull Request #448 · radical-cybertools/radical.utils

andre-merzky · 2025-09-24T10:57:01Z

This now covers different flux versions and also allows for multiple flux instances to work concurrently.

This code should eventually move to Rhapsody.

…s into fix/flux

…r`, `exit-timeout`)

gemini-code-assist · 2025-09-24T10:57:26Z

Summary of Changes

Hello @andre-merzky, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant refactoring and enhancement of the Flux integration within the system. The primary goal is to provide a robust and flexible abstraction layer that can seamlessly handle various Flux versions and allow for the concurrent operation of multiple Flux instances. This change lays the groundwork for improved job management and resource utilization, particularly in environments like Rhapsody, where this code is eventually intended to be used.

Highlights

Flux Abstraction Layer: Introduced a comprehensive abstraction layer for Flux, supporting different Flux versions (v0 and v1) and enabling concurrent Flux instances. This includes new modules for Flux service management, version-specific helpers, and dynamic module loading.
Modular Flux Implementation: The previous monolithic flux.py has been refactored into a flux package, separating concerns into FluxService (for managing Flux instances), FluxHelperV0/V1 (for version-specific job interaction), and FluxModule (for dynamic Flux module loading and version detection).
Environment Handling Enhancements: The env_write function in radical.utils.env now includes an extend parameter, allowing more flexible handling of environment variable quoting, particularly for shell directives.
Profiling Tool Integration: Added a new Yappi class to radical.utils.profile for Python code profiling, including automatic generation of pstats and optional gprof2dot visualization.
Robustness Improvements: Enhanced the _read_file_counter function in radical.utils.ids for better resource management and made EnvProcess exit handling more robust. TypedDict initialization and setter logic were also refined.
Testing Updates: The integration tests for Flux have been completely rewritten to utilize the new FluxService and FluxHelper APIs, ensuring proper functionality of the new abstraction layer.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

codecov · 2025-09-24T10:59:08Z

Codecov Report

❌ Patch coverage is 30.71298% with 379 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.23%. Comparing base (bd6d571) to head (19d86b3).
⚠️ Report is 65 commits behind head on devel.

Files with missing lines	Patch %	Lines
src/radical/utils/flux/flux_helper_v1.py	16.04%	136 Missing ⚠️
src/radical/utils/flux/flux_helper_v0.py	21.62%	87 Missing ⚠️
src/radical/utils/flux/flux_service.py	26.19%	62 Missing ⚠️
src/radical/utils/flux/flux_module.py	48.11%	55 Missing ⚠️
src/radical/utils/profile.py	25.92%	20 Missing ⚠️
src/radical/utils/env.py	33.33%	14 Missing ⚠️
src/radical/utils/typeddict.py	72.72%	3 Missing ⚠️
src/radical/utils/ids.py	87.50%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel     #448      +/-   ##
==========================================
- Coverage   61.79%   61.23%   -0.57%     
==========================================
  Files          62       66       +4     
  Lines        7133     7359     +226     
==========================================
+ Hits         4408     4506      +98     
- Misses       2725     2853     +128

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR implements a flux abstraction layer used in the Rhapsody project, introducing support for different flux versions and enabling multiple concurrent flux instances. The implementation includes a modular flux service architecture, version-specific helper classes, and various test and utility improvements.

Key Changes

Replaced the old flux module with a new modular flux abstraction layer supporting version detection and concurrent instances
Added comprehensive flux service and helper classes for both flux v0 and v1
Improved test infrastructure with better process handling and error reporting

Reviewed Changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/radical/utils/flux.py	Removed old monolithic flux implementation (642 lines deleted)
src/radical/utils/flux/init.py	New flux module entry point with version-aware helper selection
src/radical/utils/flux/flux_service.py	New flux service for managing flux instances
src/radical/utils/flux/flux_module.py	New flux module wrapper with version detection and spec utilities
src/radical/utils/flux/flux_helper_v0.py	New v0-specific flux helper implementation
src/radical/utils/flux/flux_helper_v1.py	New v1-specific flux helper implementation
tests/integration_tests/test_flux.py	Simplified flux integration tests using new API
src/radical/utils/typeddict.py	Performance improvements in TypedDict initialization
src/radical/utils/profile.py	Added Yappi profiling context manager
src/radical/utils/ids.py	Improved file counter handling with proper error recovery
tests/unittests/test_heartbeat.py	Added helper function for better process cleanup
src/radical/utils/env.py	Enhanced environment variable handling with extend support
tests/unittests/test_env.py	Added conditional check for env_proc
tests/unittests/test_typeddict.py	Added shebang and main execution block
src/radical/utils/init.py	Updated imports for new flux classes

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/radical/utils/flux/flux_service.py

Copilot · 2025-09-24T10:59:12Z

src/radical/utils/ids.py

+        finally:
+            try:
+                os.close(fd)
+            except:
+                pass


The file descriptor fd may not be defined if an exception occurs before line 257. The finally block should check if fd is defined before attempting to close it.

Copilot · 2025-09-24T10:59:12Z

tests/unittests/test_heartbeat.py

+    def _join(proc, timeout=0.1):
+        proc.join(timeout=timeout)
+        try:
+            os.waitpid(test_proc.pid, os.WNOHANG)


The function uses test_proc.pid instead of proc.pid. This will cause a NameError if test_proc is not in scope, or use the wrong process ID.

Suggested change

os.waitpid(test_proc.pid, os.WNOHANG)

os.waitpid(proc.pid, os.WNOHANG)

Copilot · 2025-09-24T10:59:12Z

src/radical/utils/flux/flux_service.py

+        self._uri  = None
+        self._proc = None
+
+        self._log.info('%s: found flux uri: %s', self._uid, self.uri)


This log message says 'found flux uri' but is actually logging when the service is stopped and uri is None. The message should be 'stopped flux service' or similar.

Suggested change

self._log.info('%s: found flux uri: %s', self._uid, self.uri)

self._log.info('%s: stopped flux service (uri was: %s)', self._uid, self.uri)

Copilot · 2025-09-24T10:59:12Z

src/radical/utils/typeddict.py

            return self.__dict__['_data']



The condition checks for '_data' but no longer handles the case where '_data' is not in dict. This could cause a KeyError since the initialization logic was moved.

Suggested change

return self.__dict__['_data']

if '_data' in self.__dict__:

return self.__dict__['_data']

else:

raise AttributeError("'_data' attribute not initialized")

gemini-code-assist

Code Review

This pull request introduces a significant and well-structured refactoring of the Flux integration, creating an abstraction layer that supports multiple Flux versions and concurrent instances. The changes are extensive, replacing the old flux.py with a new flux package containing version-specific helpers and a service class. My review has identified a few critical and high-severity issues in the new helper classes, primarily related to thread management, potential deadlocks, and correctness in the shutdown logic, which should be addressed to ensure the stability of this new abstraction.

src/radical/utils/flux/flux_helper_v1.py

src/radical/utils/flux/flux_helper_v0.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…s into fix/flux

andre-merzky · 2025-09-29T09:59:11Z

@mtitov : this should be ready now.

mtitov

LGTM!

src/radical/utils/typeddict.py

andre-merzky added 30 commits March 3, 2025 01:16

snap

e734a05

add simple yappi resource manager

ac30f57

Merge branch 'feature/yappi' into fix/flux

d750b93

merge from devel

04698dc

merge from devel

6175b9f

merge from devel

fbfe1e7

merge from devel

1d11588

merge from devel

7e88c30

merge from devel

d4c333a

merge from devel

22ff26c

merge from devel

bfac7ff

merge from devel

f5378a8

merge from devel

16d0fcf

merge from devel

ab92c3f

merge from devel

19054ec

snap

0b4a88a

snap

048e356

Merge branch 'fix/flux' of github.com:radical-cybertools/radical.util…

d5e74e0

…s into fix/flux

snap

0825681

snap

3da53e7

more details, faster typed-dict

cc127cb

Merge branch 'feature/yappi' into fix/flux

c3ca6e5

capture flux stderr

e78cdb2

snap

1eb90c2

snap

130dfc5

snap

f99a189

snap

b0b4523

snap

22285ca

snap

de0a839

snap

4101b05

andre-merzky and others added 7 commits July 7, 2025 11:19

cleanup

1f72244

fix timeout

9de2bb3

updated JobSpec creation

bf6e712

allow env variable being referred as part of the value

0367672

added control over exit procedures for Flux jobs/tasks (`exit-on-erro…

c507ce2

…r`, `exit-timeout`)

fix tests

8f1b6ab

fix tests

8118c16

andre-merzky requested review from AymenFJA, Copilot and mtitov September 24, 2025 10:57

andre-merzky self-assigned this Sep 24, 2025

andre-merzky added priority:high topic:api topic:execution type:enhancement external labels Sep 24, 2025

Copilot AI reviewed Sep 24, 2025

View reviewed changes

gemini-code-assist bot reviewed Sep 24, 2025

View reviewed changes

src/radical/utils/flux/flux_helper_v1.py Show resolved Hide resolved

src/radical/utils/flux/flux_helper_v1.py Outdated Show resolved Hide resolved

src/radical/utils/flux/flux_helper_v0.py Show resolved Hide resolved

andre-merzky and others added 4 commits September 29, 2025 11:53

Update src/radical/utils/flux/flux_helper_v0.py

72c36aa

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

respond to comments

cb30eae

Merge branch 'fix/flux' of github.com:radical-cybertools/radical.util…

fabc451

…s into fix/flux

respond to comments

19d86b3

andre-merzky enabled auto-merge September 29, 2025 09:58

mtitov approved these changes Oct 30, 2025

View reviewed changes

andre-merzky merged commit 8d49419 into devel Oct 30, 2025
9 checks passed

andre-merzky deleted the fix/flux branch October 30, 2025 17:25

mtitov reviewed Oct 30, 2025

View reviewed changes

src/radical/utils/typeddict.py Show resolved Hide resolved

	os.waitpid(test_proc.pid, os.WNOHANG)
	os.waitpid(proc.pid, os.WNOHANG)

	self._log.info('%s: found flux uri: %s', self._uid, self.uri)
	self._log.info('%s: stopped flux service (uri was: %s)', self._uid, self.uri)

Conversation

andre-merzky commented Sep 24, 2025

Uh oh!

gemini-code-assist bot commented Sep 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

codecov bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andre-merzky commented Sep 29, 2025

Uh oh!

mtitov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Sep 24, 2025 •

edited

Loading