Calculate linter.config.jobs in cgroupsv2 environments #10089

DominicLavery · 2024-11-21T16:11:08Z

Type of Changes

	Type
✓	🐛 Bug fix
✓	✨ New feature

Description

In containers running on cgroupv2 systems _query_cpu currently returns None. This results in sched_getaffinity being used, which will normally return all installed CPUs of the host. This can result in crashes with the error:

concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

The changes here use the CPU quota in v2 systems. max 100000 is the default value, and will continue to result in the hosts CPU count being used.

cpu.weight (the replacement of cpu shares from v1) could also be used, but as it's impact on CPU scheduling is relative to the rest of the cgroup hierarchy it isn't possible to get an accurate value on all systems where pylint may be run. Whereas this method is reliable on any container with a CPU quota

Closes #10103

codecov · 2024-11-23T16:51:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.80%. Comparing base (55098c7) to head (972f45e).

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #10089   +/-   ##
=======================================
  Coverage   95.80%   95.80%           
=======================================
  Files         174      174           
  Lines       18973    18992   +19     
=======================================
+ Hits        18177    18196   +19     
  Misses        796      796

Files with missing lines	Coverage Δ
pylint/lint/run.py	`89.18% <100.00%> (+1.59%)`	⬆️

DanielNoord

Awesome! Thanks for the PR. I just have some questions about structure and placement of the tests, but the code LGTM!

DanielNoord · 2024-11-24T20:54:00Z

pylint/lint/run.py

@@ -65,6 +65,18 @@ def _query_cpu() -> int | None:
            cpu_shares = int(file.read().rstrip())
        # For AWS, gives correct value * 1024.
        avail_cpu = int(cpu_shares / 1024)
+    elif Path("/sys/fs/cgroup/cpu.max").is_file():


Shouldn't this take precedence over cpu.shares? If the new file is present it should probably be preferred over the old one?

I wouldn't expect them to co-exist so just went with a quick grouping around v1 vs v2 but it probably makes sense just in case.
I've pushed a new commit which prefers the v2 files and I think also makes the logic a bit clearer

DanielNoord · 2024-11-24T20:55:12Z

tests/lint/test_run.py

+from pylint.testutils.utils import _test_cwd
+
+
+@pytest.mark.parametrize(


Why isn't this grouped with the other tests? I don't think we need a separate file as we don't follow the "file + test_file" structure

I had a bit of confusion around using protected members in tests python. Figured it out and merged the tests back into the other file

DominicLavery · 2024-11-26T13:21:53Z

Hey @DanielNoord! Thanks so much for the review. I've replied to your comments and pushed some fixes

Pierre-Sassoulas

Hey @DominicLavery thank you for the PR ! would you mind adding a new changelog please ? (By adding a file here: https://github.com/pylint-dev/pylint/tree/main/doc/whatsnew/fragments)

DominicLavery · 2024-12-02T11:04:34Z

Hey @DominicLavery thank you for the PR ! would you mind adding a new changelog please ? (By adding a file here: https://github.com/pylint-dev/pylint/tree/main/doc/whatsnew/fragments)

Thanks @Pierre-Sassoulas! Absolutely :) I've pushed that change

DominicLavery · 2024-12-02T11:24:30Z

Ah I suspect now that the v2 checks are happening first, I need to add a mock along the lines of

        if args[0] == "/sys/fs/cgroup/cpu.max":
            return MagicMock(is_file=lambda: False)

to test_pylint_run_jobs_equal_zero_dont_crash_with_cpu_fraction
to make sure the v1 path is still tested

…dev#10103

Pierre-Sassoulas · 2024-12-02T21:41:50Z

I moved some code around to minimize the diff and make more apparent that the existing code was not modified. This should help with the coverage job being unhappy.

Pierre-Sassoulas · 2024-12-02T21:54:13Z

(But better test coverage would be appreciated if you feel inclined to test the existing code better than it was originally 😄 )

github-actions · 2024-12-02T22:02:55Z

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit 972f45e

DanielNoord · 2024-12-03T10:27:08Z

Sorry @DominicLavery, life got very busy all of a sudden and I didn't have time to finish this review. If @Pierre-Sassoulas approves the PR I'm happy as well! Thanks for your contribution! :)

DominicLavery · 2024-12-04T14:50:22Z

No worries @DanielNoord!

@Pierre-Sassoulas I've added some extra tests and created shared mock set ups to dedupe a bit. Hope this helps :)

Pierre-Sassoulas · 2024-12-04T20:49:00Z

Thank you @DominicLavery appreciated. It seems some tests are failing now, not sure about why in the pylint job, it's a strange fail. Let me know if I need to approve the pipeline again so it runs. Or we might merge the previous version so you can run pipelines automatically on the refactor.

jacobtylerwalls · 2024-12-07T02:00:31Z

I think both failures are explained by #10114.

DominicLavery · 2024-12-07T13:30:04Z

Thanks jacobtylerwalls! I've subscribed to the issues and will rebase when I see them closed

jacobtylerwalls · 2024-12-07T14:23:37Z

Pardon my pushing to your branch: I just want to test out the proposed fix in the astroid repo and see how far that gets us.

DominicLavery force-pushed the cgroupsv2-cpu-count branch from 9ff146e to ff64dd8 Compare November 21, 2024 16:19

Pierre-Sassoulas added Bug 🪲 multiprocessing backport maintenance/3.3.x labels Nov 21, 2024