Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: add QueueList class for access to queue config, limits, status and resource counts #6723

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Mar 21, 2025

Problem: There's currently no easy way to access Flux queue information from the Python API, and that information is currently distributed between different parts the config, various job-manager RPCs, and resource configuration.

This PR adds a new flux.queue.QueueList class which fetches information for all currently configured queues in one go. It then presents as a list of QueueInfo objects for each queue, which offer access to status, limits, and even resources counts on a per queue basis. This class is then used to replace similar code in flux queue list.

In a follow on PR, flux queue list will be extended to allow access to the resource stats.

grondo added 11 commits March 21, 2025 06:55
Problem: There's a typo in the vendored Python modules README.md,
the 'tomli' package is misspelled.

Fix the typo.
Problem: Python dataclasses would be useful, but they were not
introduced until Python 3.7 and Flux still has to support 3.6.

Add the dataclassees backport to 3.6 from:

 https://github.com/ericvsmith/dataclasses
Problem: Since the dataclasses source is added to flux.utils directly,
importing the `dataclass` decorator requires an overly long path.

Add an __init__.py to the dataclasses source directory so that
dataclass can be imported from flux.utils.dataclasses.
Problem: The dataclasses backport is not included in `make dist`
or `make install`.

Add the directory to Makefile.am so it is.
Problem: The typos checker detects typos in a vendored python module.

Ignore the module in .typos.toml.
Problem: The SchedResourceList initializer assumes it will only be
passed a dictionary of dictionaries, each of which needs to be passed
to the ResourceSet() constructor. This assumption makes it awkward
to create a SchedResourceList from another instance of the same class.

Check to see if the each value is already a ResourceSet in the `resp`
parameter to the SchedResourceList initializer, and only convert to
a ResourceSet if necessary.
Problem: There's no easy way to create a copy of a SchedResourceList
object using a constraint, e.g. to return only the reosurces in a
given queue.

Add a copy_constraint() method to the SchedResourceList class for
this purpose.
Problem: There's no user friendly method in Python to fetch the list
of configured queues along with information about them, including
name (if named queues are used), status, limits and defaults, and
associated resources.

Add a QueueList class which can fetch information about all queues
using a combination of the configuration, queue statuses, and resource
list. It presents as a list of QueueInfo objects, each of which contains
attributes describing the above data.
Problem: There's no tests of the Python QueueList class.

Add t/python/t0034-queuelist.py for this purpose.
Problem: The `--from-stdin` option in `flux queue list` is not used.

Drop it.
Problem: Much of the code in `flux queue list` duplicates functionality
in the flux.queue.QueueList class.

Update `flux queue list` to use functionality from the
flux.queue.QueueList class.
@grondo grondo force-pushed the qlist-pyclass branch 2 times, most recently from 35fdc4d to a2e3c8d Compare March 21, 2025 21:56
Copy link

codecov bot commented Mar 21, 2025

Codecov Report

Attention: Patch coverage is 98.76543% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.84%. Comparing base (a718ef5) to head (a2e3c8d).

Files with missing lines Patch % Lines
src/bindings/python/flux/queue.py 98.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6723      +/-   ##
==========================================
+ Coverage   83.80%   83.84%   +0.04%     
==========================================
  Files         534      535       +1     
  Lines       89164    89195      +31     
==========================================
+ Hits        74720    74783      +63     
+ Misses      14444    14412      -32     
Files with missing lines Coverage Δ
src/bindings/python/flux/resource/list.py 92.75% <100.00%> (+0.81%) ⬆️
src/cmd/flux-queue.py 94.89% <100.00%> (+0.29%) ⬆️
src/bindings/python/flux/queue.py 98.33% <98.33%> (ø)

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@jameshcorbett jameshcorbett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of small comments but generally looks great!

@@ -83,6 +83,12 @@ def free(self):
res.state = "free"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in commit message: resources

@@ -82,6 +84,7 @@ def free(self):
return res



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray newline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm assuming black added this, but I can try removing it and see if it complains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I get it, as @chu11 said, in the wrong commit.

config = future.get()
except Exception as e:
LOGGER.warning("Could not get flux config: " + str(e))
future = handle.rpc("config.get")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be handle.conf_get()? Since there's a method for it I think I would prefer that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! However, this particular commit was updating existing code and I'd prefer not to make two changes at once. Actually, I see there are several uses of handle.rpc("config.get") throughout the code that should all be updated, let's do that in a separate PR.

Attributes:
min (:obj:`QueueResourceCounts`): Configured resource minimums.
max (:obj:`QueueResourceCounts`): Configured resource maximums.
(If no maximum, individual resource count will be ``inf``)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

math.inf is not an integer so that violates the type hints for QueueResourceCounts. Maybe typing.Union[int, float]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm not all that familiar with Python typing hints. Is there a more generic "Number" type hint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah from PEP 484

when an argument is annotated as having type float, an argument of type int is acceptable;

I'll use float

Comment on lines +252 to +255
resources, config = map(
lambda x: x.get(),
[resource_list(handle), handle.rpc("config.get")],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resources = resource_list.get()
config = handle.conf_get()

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a comment perhaps, but we don't want to run the RPCs back-to-back. (This code initiates both RPCs before calling get on either (at least I hope that's what it does)). May need to change anyway if we update to handle.conf_get() - though now I'm realizing there's no way to make that call asynchronous, which is why the explicit RPC may be better here 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems reasonable. You could maybe do:

resources = resource_list(handle)
config = handle.conf_get()
resources = resources.get()

?

If ``queues`` is None, return the whole config, which may be empty
if there is no configured queues.
"""
if not queues:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently if you pass in an empty list for queues you would get the whole config returned. That doesn't make a ton of sense to me, I think I would probably go for creating an empty object or raising an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done purposely to easily support callers that are passing argparse args.queue option directly which is of type append or Flux's FilterSetUpdate, where the default (no queue specifically specified) value is an empty list or set.

I will update the documentation accordingly.

>>> batchq = qlist["batch"]
If there is a single anonymous queue, the queue name is the empty string,
so this queue may be indexed using an empty string as the key:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is something to be said for having the queue name be None in this case rather than the empty string. I don't think it would make the interface any different really, but None is more of what I'd expect in this case, and it would simplify the implementation a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the empty string is required for use in flux queue list, where we don't want to print anything at all for the anonymous queue (we don't want to display queue name "None").

Copy link
Member

@jameshcorbett jameshcorbett Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok makes sense. There could be a check for None in the flux queue list command though right? Up to you though.

Comment on lines +317 to +319
if "queues" in config:
return config["queues"]
return {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe return config.get("queues") instead?

result = default
return result

def policy_system_limit(self, key, default="inf"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these policy_ and size_ methods be prefixed with an underscore? They don't seem like they should be part of the interface.

"""
self.fh.rpc("config.load", tomllib.loads(testconf)).get()

qlist = QueueList(self.fh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a test like QueueList(self.fh, [])?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also requesting specific queues, QueueList(self.fh, ["batch"], and probably multiple queues ["batch", "debug"]

@@ -82,6 +84,7 @@ def free(self):
return res



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errant line introduced? (in python: accept ResourceSet dict in SchedResourceList initializer commit)

attributes are also documented for each corresponding class in this module::
queue.name (str) (empty string if using a single anonymous queue)
queue.is_default (str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_default -> bool?


min: QueueResourceCounts
max: QueueResourceCounts
duration: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str -> float?

"""
self.fh.rpc("config.load", tomllib.loads(testconf)).get()

qlist = QueueList(self.fh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also requesting specific queues, QueueList(self.fh, ["batch"], and probably multiple queues ["batch", "debug"]

return t
def __getattr__(self, attr):
# Forward most attribute lookups to underlying QueueLimits instance
return getattr(self.__limits, attr)


class QueueInfo:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid confusion with the new QueueInfo class, rename to ....???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants