-
Notifications
You must be signed in to change notification settings - Fork 543
Resolve intermittent capture_output
deadlock
#3601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3601 +/- ##
=======================================
Coverage 88.76% 88.76%
=======================================
Files 891 891
Lines 102297 102320 +23
=======================================
+ Hits 90802 90826 +24
+ Misses 11495 11494 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@@ -559,6 +579,31 @@ def test_capture_output_stack_error(self): | |||
sys.stdout, sys.stderr = old | |||
logging.getLogger('pyomo.common.tee').handlers.clear() | |||
|
|||
def test_atomic_deadlock(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nifty test, I like it.
@@ -288,7 +288,7 @@ def set_instance(self, model): | |||
f'({self.available()}).' | |||
) | |||
|
|||
with TeeStream(*ostreams) as t, capture_output(t.STDOUT, capture_fd=True): | |||
with capture_output(TeeStream(*ostreams), capture_fd=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between doing this and just doing with TeeStream(*ostreams) as OUT
? Do we have a preference? (I ask because we don't seem to have decided on a "standard" and we probably should)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capture_output
wraps the streams it is passed into a TeeStream
in order to get handles it can give to stdout / stderr. So the former implementation actually had 2 nested TeeStream
objects. If you just pass the TeeStream
object to capture_output
, it won't create a new TeeStream
, so the new implementation only has a single TeeStream
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, let me rephrase. In ipopt currently, we do this:
with TeeStream(*ostreams) as t:
timer.start('subprocess')
process = subprocess.run(
cmd,
timeout=timeout,
env=env,
universal_newlines=True,
stdout=t.STDOUT,
stderr=t.STDERR,
check=False,
)
timer.stop('subprocess')
# This is the stuff we need to parse to get the iterations
# and time
(iters, ipopt_time_nofunc, ipopt_time_func, ipopt_total_time) = (
self._parse_ipopt_output(ostreams[0])
)
In GUROBI, we do:
with capture_output(TeeStream(*ostreams), capture_fd=False):
gurobi_model = gurobipy.Model()
timer.start('transfer_model')
x = gurobi_model.addMVar(
len(repn.columns),
lb=lb,
ub=ub,
obj=repn.c.todense()[0] if repn.c.shape[0] else 0,
vtype=vtype,
)
Which one is preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are both correct for their respective use cases:
subprocess.run
needs to be passed 2 TextIO objects, so it is appropriate (and necessary) to pass the raw TextIO objects fromTeeStream
.capture_output
needs a single stream to send all output to (we have no added support for separate our/err targets (so internally it creates a TeeStream so that it can get raw file descriptors if it needs them (forcapture_fd=True
). Here Passing a TeeStream directly avoids the need to create a second one that wraps the stream from the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thank you!
Per @blnicho - we could merge this today if no one has time to do another review. Ergo, risking it for the biscuit and merging. |
Fixes # .
Summary/Motivation:
We are continuing to see intermittent deadlocks or attempts to write to closed file descriptors in the HiGHS tests on main. I think I have (finally) tracked down the root cause to the use of
capture_input()
in the Gurobi interface__del__
method. It looks like the GC will (randomly) interrupt a HiGHS test to run the collector (which, if it is collecting a Gurobi interface will callcapture_output
. If this happens during another test's use ofcapture_output.__enter__
or.__exit__
, then the Gurobi interface's use of capture could happen when the underlying output streams are in a transitory state (and could be closed).This PR resolves that by:
capture_output
in__del__
.capture_output
will not interrupts another fall to__enter__ /
exit`Changes proposed in this PR:
Legal Acknowledgement
By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution: