Resolve intermittent `capture_output` deadlock #3601

jsiirola · 2025-05-08T20:29:08Z

Fixes # .

Summary/Motivation:

We are continuing to see intermittent deadlocks or attempts to write to closed file descriptors in the HiGHS tests on main. I think I have (finally) tracked down the root cause to the use of capture_input() in the Gurobi interface __del__ method. It looks like the GC will (randomly) interrupt a HiGHS test to run the collector (which, if it is collecting a Gurobi interface will call capture_output. If this happens during another test's use of capture_output.__enter__ or .__exit__, then the Gurobi interface's use of capture could happen when the underlying output streams are in a transitory state (and could be closed).

This PR resolves that by:

rework the Gurobi interface to use it's own private Env (which suppresses the output to stdout and removes the need to use capture_output in __del__.
Add thread locks so that one call to capture_output will not interrupts another fall to __enter__ / exit`
improve some implementations within TeeStream / capture_output
add additional tests that catch some edge cases identified during development

Changes proposed in this PR:

(see above)

Legal Acknowledgement

By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution:

I agree my contributions are submitted under the BSD license.
I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

codecov · 2025-05-08T22:07:58Z

Codecov Report

Attention: Patch coverage is 94.89796% with 5 lines in your changes missing coverage. Please review.

Project coverage is 88.76%. Comparing base (457f436) to head (41b722a).

Files with missing lines	Patch %	Lines
pyomo/contrib/solver/solvers/gurobi_persistent.py	62.50%	3 Missing ⚠️
pyomo/contrib/solver/solvers/gurobi_direct.py	95.00%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3601   +/-   ##
=======================================
  Coverage   88.76%   88.76%           
=======================================
  Files         891      891           
  Lines      102297   102320   +23     
=======================================
+ Hits        90802    90826   +24     
+ Misses      11495    11494    -1

Flag	Coverage Δ
builders	`26.64% <65.30%> (+0.02%)`	⬆️
default	`84.87% <88.77%> (?)`
expensive	`34.01% <72.44%> (?)`
linux	`86.55% <91.83%> (-1.95%)`	⬇️
linux_other	`86.55% <91.83%> (+<0.01%)`	⬆️
osx	`82.84% <88.77%> (+<0.01%)`	⬆️
win	`84.75% <91.83%> (+<0.01%)`	⬆️
win_other	`84.75% <91.83%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pyomo/common/tee.py

mrmundt · 2025-05-12T14:03:00Z

pyomo/common/tests/test_tee.py

@@ -559,6 +579,31 @@ def test_capture_output_stack_error(self):
            sys.stdout, sys.stderr = old
            logging.getLogger('pyomo.common.tee').handlers.clear()

+    def test_atomic_deadlock(self):


This is a nifty test, I like it.

mrmundt · 2025-05-12T14:06:05Z

pyomo/contrib/solver/solvers/highs.py

@@ -288,7 +288,7 @@ def set_instance(self, model):
                f'({self.available()}).'
            )

-        with TeeStream(*ostreams) as t, capture_output(t.STDOUT, capture_fd=True):
+        with capture_output(TeeStream(*ostreams), capture_fd=True):


What is the difference between doing this and just doing with TeeStream(*ostreams) as OUT? Do we have a preference? (I ask because we don't seem to have decided on a "standard" and we probably should)

capture_output wraps the streams it is passed into a TeeStream in order to get handles it can give to stdout / stderr. So the former implementation actually had 2 nested TeeStream objects. If you just pass the TeeStream object to capture_output, it won't create a new TeeStream, so the new implementation only has a single TeeStream.

Sorry, let me rephrase. In ipopt currently, we do this:

with TeeStream(*ostreams) as t: timer.start('subprocess') process = subprocess.run( cmd, timeout=timeout, env=env, universal_newlines=True, stdout=t.STDOUT, stderr=t.STDERR, check=False, ) timer.stop('subprocess') # This is the stuff we need to parse to get the iterations # and time (iters, ipopt_time_nofunc, ipopt_time_func, ipopt_total_time) = ( self._parse_ipopt_output(ostreams[0]) )

In GUROBI, we do:

with capture_output(TeeStream(*ostreams), capture_fd=False): gurobi_model = gurobipy.Model() timer.start('transfer_model') x = gurobi_model.addMVar( len(repn.columns), lb=lb, ub=ub, obj=repn.c.todense()[0] if repn.c.shape[0] else 0, vtype=vtype, )

Which one is preferable?

They are both correct for their respective use cases:

subprocess.run needs to be passed 2 TextIO objects, so it is appropriate (and necessary) to pass the raw TextIO objects from TeeStream.

capture_output needs a single stream to send all output to (we have no added support for separate our/err targets (so internally it creates a TeeStream so that it can get raw file descriptors if it needs them (for capture_fd=True). Here Passing a TeeStream directly avoids the need to create a second one that wraps the stream from the first.

Got it, thank you!

mrmundt · 2025-05-14T22:43:38Z

Per @blnicho - we could merge this today if no one has time to do another review. Ergo, risking it for the biscuit and merging.

jsiirola added 13 commits May 8, 2025 10:04

Rework gurobi to avoid use of default Env

97677ea

Rework available() implementation / caching for gurobi interfaces

a5f2f22

Fix error message (remove reference to APPSI)

fee116c

Simplify HiGHS output management (avoid nested TeeStreams)

f3312c4

Add lock so entering/leaving capture_output is atomic

6336c8d

Rework polling to bound polling interval around 1.6 seconds

6324038

gurobi available(): minor rework, add comments

8fdb7de

Test using capture_output with a closed stdout

7a51c18

Test atomic deadlock

0741b78

Make capture_output more stateless

8b5ac8c

support calling capture_output.reset() on 'closed' instances

1d72abd

TeeStream should close when exiting the outermost context

ec38f30

simplify TeeStream flush handling

41b722a

jsiirola added 2 commits May 9, 2025 00:08

NFC: fix comment

bbaa344

simplify model construction

4028582

mrmundt reviewed May 12, 2025

View reviewed changes

mrmundt approved these changes May 12, 2025

View reviewed changes

mrmundt merged commit f680e04 into Pyomo:main May 14, 2025
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolve intermittent `capture_output` deadlock #3601

Resolve intermittent `capture_output` deadlock #3601

Uh oh!

jsiirola commented May 8, 2025

Uh oh!

codecov bot commented May 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mrmundt May 12, 2025

Uh oh!

mrmundt May 12, 2025

Uh oh!

jsiirola May 12, 2025

Uh oh!

mrmundt May 12, 2025

Uh oh!

jsiirola May 12, 2025

Uh oh!

mrmundt May 12, 2025

Uh oh!

mrmundt commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Resolve intermittent capture_output deadlock #3601

Resolve intermittent capture_output deadlock #3601

Uh oh!

Conversation

jsiirola commented May 8, 2025

Fixes # .

Summary/Motivation:

Changes proposed in this PR:

Legal Acknowledgement

Uh oh!

codecov bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

mrmundt May 12, 2025

Choose a reason for hiding this comment

Uh oh!

mrmundt May 12, 2025

Choose a reason for hiding this comment

Uh oh!

jsiirola May 12, 2025

Choose a reason for hiding this comment

Uh oh!

mrmundt May 12, 2025

Choose a reason for hiding this comment

Uh oh!

jsiirola May 12, 2025

Choose a reason for hiding this comment

Uh oh!

mrmundt May 12, 2025

Choose a reason for hiding this comment

Uh oh!

mrmundt commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Resolve intermittent `capture_output` deadlock #3601

Resolve intermittent `capture_output` deadlock #3601

codecov bot commented May 8, 2025 •

edited

Loading