Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust WMAgent configuration such that it keeps less work in its queue #12057

Open
amaltaro opened this issue Jul 25, 2024 · 0 comments
Open

Comments

@amaltaro
Copy link
Contributor

Impact of the new feature
WMAgent

Is your feature request related to a problem? Please describe.
The idea is not to keep a too large backlog in the agent's queue: jobs to be created, jobs created, jobs pending; such that badly behaving workflows - especially with respect to very short jobs - don't crash the agents and/or blow up component's duty cycle.

Describe the solution you'd like
Making up a reasonable enough number, I would say keeping between 50-70k jobs pending in the condor queue (and nothing else queued in local workqueue and/or in created status) would be a good commitment.

This can be achieved by tweaking one - or both - WorkQueueManager parameters:
https://github.com/dmwm/WMCore/blob/a63cf47/etc/WMAgentConfig.py#L147-L148

or the AgentStatusWatcher pending attributes:
https://github.com/dmwm/WMCore/blob/a63cf47/etc/WMAgentConfig.py#L355-L356

Note that for AgentStatusWatcher attributes, the actual thresholds are weighted according to the number of agents connected t the same team name (the more agents share the same team, the smaller those pending thresholds are).

Describe alternatives you've considered
We will probably have to test out a few different configurations with a production scenario, hence with 4 or 5 agents connected to the same team name.

Additional context
None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

1 participant