Skip to content

Commit 600e9e0

Browse files
authored
Crash diff server when its process pool dies (#49)
If the server's process pool (used for actually running the diffs) fails repeatedly, assume it just can't be started and crash the whole server so a process manager can clean things up and totally start over. You can retain the old behavior (where the server keeps trying to start a new process pool if the old one is broken) by setting the `RESTART_BROKEN_DIFFER` environment variable to `true`. Fixes #42.
1 parent 982dccd commit 600e9e0

File tree

4 files changed

+86
-27
lines changed

4 files changed

+86
-27
lines changed

Diff for: .env.example

+11-22
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,15 @@
1-
# All environmental variables are optional. The associated settings can be
2-
# updated at runtime. These merely provide defaults at import time.
1+
# All environment variables are optional. The associated settings can be
2+
# updated at runtime. These provide a friendly interface to changing settings
3+
# when running commands from a CLI.
34

5+
# Diff-Related variables --------------------------
46

5-
# These are used in the web_monitoring.db module.
6-
# They can point to a local deployment of web-monitoring-db (as in this
7-
# example) or to the production deployment.
8-
export WEB_MONITORING_DB_URL="http://localhost:3000"
9-
export WEB_MONITORING_DB_EMAIL="[email protected]"
10-
export WEB_MONITORING_DB_PASSWORD="PASSWORD"
11-
export WEB_MONITORING_APP_ENV="development" # or production or test
12-
13-
# These are used in test_html_diff
14-
export WEB_MONITORING_DB_STAGING_URL="https://api-staging.monitoring.envirodatagov.org"
15-
export WEB_MONITORING_DB_STAGING_EMAIL=""
16-
export WEB_MONITORING_DB_STAGING_PASSWORD=""
7+
# These CSS color values are used to set the colors in html_diff_render, differs and links_diff
8+
# export DIFFER_COLOR_INSERTION="#4dac26"
9+
# export DIFFER_COLOR_DELETION="#d01c8b"
1710

1811

19-
# Diff-Related variables --------------------------
12+
# Server-Related variables ------------------------
2013

2114
# Set the diffing server to debug mode. Returns tracebacks in error responses
2215
# and auto-reloads the server when source files change.
@@ -38,13 +31,9 @@ export DIFFER_MAX_BODY_SIZE='10485760' # 10 MB
3831
# https:// requests have invalid certificates.
3932
# export VALIDATE_TARGET_CERTIFICATES="false"
4033

41-
# These CSS color values are used to set the colors in html_diff_render, differs and links_diff
42-
# export DIFFER_COLOR_INSERTION="#4dac26"
43-
# export DIFFER_COLOR_DELETION="#d01c8b"
44-
4534
# Set how many diffs can be run in parallel.
4635
# export DIFFER_PARALLELISM=10
4736

48-
# Uncomment to enable logging. Set the level as any normal level.
49-
# https://docs.python.org/3.6/library/logging.html#logging-levels
50-
# export LOG_LEVEL=INFO
37+
# Instead of crashing when the process pool used for running diffs breaks,
38+
# keep accepting requests and try to restart the pool.
39+
# RESTART_BROKEN_DIFFER='true'

Diff for: docs/source/release-history.rst

+6
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22
Release History
33
===============
44

5+
In Development
6+
--------------
7+
8+
- The server uses a pool of child processes to run diffs. If the pool breaks while running a diff, it will be re-created once, and, if it fails again, the server will now crash with an exit code of ``10``. (An external process manager like Supervisor, Kubernetes, etc. can then decide how to handle the situation.) Previously, the diff would fail at this point, but server would try to re-create the process pool again the next time a diff was requested. You can opt-in to the old behavior by setting the ``RESTART_BROKEN_DIFFER`` environment variable to ``true``. (`#49 <https://github.com/edgi-govdata-archiving/web-monitoring-diff/issues/49>`_)
9+
10+
511
Version 0.1.1 (2020-11-24)
612
--------------------------
713

Diff for: web_monitoring_diff/server/server.py

+22-2
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
sentry_sdk.integrations.logging.ignore_logger('tornado.access')
3737

3838
DIFFER_PARALLELISM = int(os.environ.get('DIFFER_PARALLELISM', 10))
39+
RESTART_BROKEN_DIFFER = os.environ.get('RESTART_BROKEN_DIFFER', 'False').strip().lower() == 'true'
3940

4041
# Map tokens in the REST API to functions in modules.
4142
# The modules do not have to be part of the web_monitoring_diff package.
@@ -208,6 +209,7 @@ def initialize_diff_worker():
208209

209210
class DiffServer(tornado.web.Application):
210211
terminating = False
212+
server = None
211213

212214
def listen(self, port, address='', **kwargs):
213215
self.server = super().listen(port, address, **kwargs)
@@ -220,9 +222,11 @@ async def shutdown(self, immediate=False):
220222
Otherwise, diffs are allowed to try and finish.
221223
"""
222224
self.terminating = True
223-
self.server.stop()
225+
if self.server:
226+
self.server.stop()
224227
await self.shutdown_differs(immediate)
225-
await self.server.close_all_connections()
228+
if self.server:
229+
await self.server.close_all_connections()
226230

227231
async def shutdown_differs(self, immediate=False):
228232
"""Stop all child processes used for running diffs."""
@@ -238,6 +242,12 @@ async def shutdown_differs(self, immediate=False):
238242
else:
239243
await shutdown_executor_in_loop(differs)
240244

245+
async def quit(self, immediate=False, code=0):
246+
await self.shutdown(immediate=immediate)
247+
tornado.ioloop.IOLoop.current().stop()
248+
if code:
249+
sys.exit(code)
250+
241251
def handle_signal(self, signal_type, frame):
242252
"""Handle a signal by shutting down the application and IO loop."""
243253
loop = tornado.ioloop.IOLoop.current()
@@ -515,6 +525,16 @@ async def diff(self, func, a, b, params, tries=2):
515525
if executor == old_executor:
516526
executor = self.get_diff_executor(reset=True)
517527
else:
528+
# If we shouldn't allow the server to keep rebuilding the
529+
# differ pool for new requests, schedule a shutdown.
530+
# (*Schuduled* so that current requests have a chance to
531+
# complete with an error.)
532+
if not RESTART_BROKEN_DIFFER:
533+
logger.error('Process pool for diffing has failed too '
534+
'many times; quitting server...')
535+
tornado.ioloop.IOLoop.current().add_callback(
536+
self.application.quit,
537+
code=10)
518538
raise
519539

520540
# NOTE: this doesn't do anything async, but if we change it to do so, we

Diff for: web_monitoring_diff/tests/test_server_exc_handling.py

+47-3
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,9 @@ class BrokenProcessPoolExecutor(concurrent.futures.Executor):
393393
"Fake process pool that only raises BrokenProcessPool exceptions."
394394
submit_count = 0
395395

396+
def __init__(max_workers=None, mp_context=None, initializer=None, initargs=()):
397+
return super().__init__()
398+
396399
def submit(self, fn, *args, **kwargs):
397400
self.submit_count += 1
398401
result = concurrent.futures.Future()
@@ -403,10 +406,10 @@ def submit(self, fn, *args, **kwargs):
403406

404407

405408
class ExecutionPoolTestCase(DiffingServerTestCase):
406-
def fetch_async(self, path, **kwargs):
409+
def fetch_async(self, path, raise_error=False, **kwargs):
407410
"Like AyncHTTPTestCase.fetch, but async."
408411
url = self.get_url(path)
409-
return self.http_client.fetch(url, **kwargs)
412+
return self.http_client.fetch(url, raise_error=raise_error, **kwargs)
410413

411414
def test_rebuilds_process_pool_when_broken(self):
412415
# Get a custom executor that will always fail the first time, but get
@@ -427,7 +430,8 @@ def get_executor(self, reset=False):
427430
assert response.code == 200
428431
assert did_get_executor == True
429432

430-
def test_diff_returns_error_if_process_pool_repeatedly_breaks(self):
433+
@patch.object(df.DiffServer, "quit")
434+
def test_diff_returns_error_if_process_pool_repeatedly_breaks(self, _):
431435
# Set a custom executor that will always fail.
432436
def get_executor(self, reset=False):
433437
return BrokenProcessPoolExecutor()
@@ -474,6 +478,46 @@ def get_executor(self, reset=False):
474478
# have one reset because only one request hit the bad executor.
475479
assert bad_executor.submit_count == 2
476480

481+
# NOTE: the real `quit` tears up the server, which causes porblems with
482+
# the test, so we just mock it and test that it was called, rather than
483+
# checking whether `sys.exit` was ultimately called. Not totally ideal,
484+
# but better than no testing.
485+
@patch.object(df.DiffServer, "quit")
486+
def test_server_exits_if_process_pool_repeatedly_breaks(self, mock_quit):
487+
# Set a custom executor that will always fail.
488+
def get_executor(self, reset=False):
489+
return BrokenProcessPoolExecutor()
490+
491+
with patch.object(df.DiffHandler, 'get_diff_executor', get_executor):
492+
response = self.fetch('/html_source_dmp?format=json&'
493+
f'a=file://{fixture_path("empty.txt")}&'
494+
f'b=file://{fixture_path("empty.txt")}')
495+
self.json_check(response)
496+
assert response.code == 500
497+
498+
assert mock_quit.called
499+
assert mock_quit.call_args == ({'code': 10},)
500+
501+
# NOTE: the real `quit` tears up the server, which causes porblems with
502+
# the test, so we just mock it and test that it was called, rather than
503+
# checking whether `sys.exit` was ultimately called. Not totally ideal,
504+
# but better than no testing.
505+
@patch.object(df.DiffServer, "quit")
506+
@patch('web_monitoring_diff.server.server.RESTART_BROKEN_DIFFER', True)
507+
def test_server_does_not_exit_if_env_var_set_when_process_pool_repeatedly_breaks(self, mock_quit):
508+
# Set a custom executor that will always fail.
509+
def get_executor(self, reset=False):
510+
return BrokenProcessPoolExecutor()
511+
512+
with patch.object(df.DiffHandler, 'get_diff_executor', get_executor):
513+
response = self.fetch('/html_source_dmp?format=json&'
514+
f'a=file://{fixture_path("empty.txt")}&'
515+
f'b=file://{fixture_path("empty.txt")}')
516+
self.json_check(response)
517+
assert response.code == 500
518+
519+
assert not mock_quit.called
520+
477521

478522
def mock_diffing_method(c_body):
479523
return

0 commit comments

Comments
 (0)