Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if we can replace gunicornmontor with uvicorn.run() #43035

Closed
kaxil opened this issue Oct 15, 2024 · 7 comments
Closed

Investigate if we can replace gunicornmontor with uvicorn.run() #43035

kaxil opened this issue Oct 15, 2024 · 7 comments
Assignees
Labels
Milestone

Comments

@kaxil
Copy link
Member

kaxil commented Oct 15, 2024

It is most likely that we no longer need gunicornmontor or UvicornMonitor anymore. @ashb 's suggestion is for Airflow uvicorn.run() should be enough.

Whoever takes this GitHub issue should verify the same and replace it if not needed.

The code:

  • def monitor_gunicorn(gunicorn_master_proc: psutil.Process | subprocess.Popen) -> NoReturn:
    # Register signal handlers
    signal.signal(signal.SIGINT, lambda signum, _: kill_proc(signum, gunicorn_master_proc))
    signal.signal(signal.SIGTERM, lambda signum, _: kill_proc(signum, gunicorn_master_proc))
    # These run forever until SIG{INT, TERM, KILL, ...} signal is sent
    GunicornMonitor(
    gunicorn_master_pid=gunicorn_master_proc.pid,
    num_workers_expected=num_workers,
    master_timeout=120,
    worker_refresh_interval=30,
    worker_refresh_batch_size=1,
    reload_on_plugin_change=False,
    ).start()
    def start_and_monitor_gunicorn(args):
    if args.daemon:
    subprocess.Popen(run_args, close_fds=True)
    # Reading pid of gunicorn master as it will be different that
    # the one of process spawned above.
    gunicorn_master_proc_pid = None
    while not gunicorn_master_proc_pid:
    sleep(0.1)
    gunicorn_master_proc_pid = read_pid_from_pidfile(pid_file)
    # Run Gunicorn monitor
    gunicorn_master_proc = psutil.Process(gunicorn_master_proc_pid)
    monitor_gunicorn(gunicorn_master_proc)
    else:
    with subprocess.Popen(run_args, close_fds=True) as gunicorn_master_proc:
    monitor_gunicorn(gunicorn_master_proc)
  • class GunicornMonitor(LoggingMixin):
    """
    Runs forever.
    Monitoring the child processes of @gunicorn_master_proc and restarting
    workers occasionally or when files in the plug-in directory has been modified.
    Each iteration of the loop traverses one edge of this state transition
    diagram, where each state (node) represents
    [ num_ready_workers_running / num_workers_running ]. We expect most time to
    be spent in [n / n]. `bs` is the setting webserver.worker_refresh_batch_size.
    The horizontal transition at ? happens after the new worker parses all the
    dags (so it could take a while!)
    V ────────────────────────────────────────────────────────────────────────┐
    [n / n] ──TTIN──> [ [n, n+bs) / n + bs ] ────?───> [n + bs / n + bs] ──TTOU─┘
    ^ ^───────────────┘
    │ ┌────────────────v
    └──────┴────── [ [0, n) / n ] <─── start
    We change the number of workers by sending TTIN and TTOU to the gunicorn
    master process, which increases and decreases the number of child workers
    respectively. Gunicorn guarantees that on TTOU workers are terminated
    gracefully and that the oldest worker is terminated.
    :param gunicorn_master_pid: PID for the main Gunicorn process
    :param num_workers_expected: Number of workers to run the Gunicorn web server
    :param master_timeout: Number of seconds the webserver waits before killing gunicorn master that
    doesn't respond
    :param worker_refresh_interval: Number of seconds to wait before refreshing a batch of workers.
    :param worker_refresh_batch_size: Number of workers to refresh at a time. When set to 0, worker
    refresh is disabled. When nonzero, airflow periodically refreshes webserver workers by
    bringing up new ones and killing old ones.
    :param reload_on_plugin_change: If set to True, Airflow will track files in plugins_folder directory.
    When it detects changes, then reload the gunicorn.
    """
    def __init__(
    self,
    gunicorn_master_pid: int,
    num_workers_expected: int,
    master_timeout: int,
    worker_refresh_interval: int,
    worker_refresh_batch_size: int,
    reload_on_plugin_change: bool,
    ):
    super().__init__()
    self.gunicorn_master_proc = psutil.Process(gunicorn_master_pid)
    self.num_workers_expected = num_workers_expected
    self.master_timeout = master_timeout
    self.worker_refresh_interval = worker_refresh_interval
    self.worker_refresh_batch_size = worker_refresh_batch_size
    self.reload_on_plugin_change = reload_on_plugin_change
    self._num_workers_running = 0
    self._num_ready_workers_running = 0
    self._last_refresh_time = time.monotonic() if worker_refresh_interval > 0 else None
    self._last_plugin_state = self._generate_plugin_state() if reload_on_plugin_change else None
    self._restart_on_next_plugin_check = False
@kaxil kaxil converted this from a draft issue Oct 15, 2024
@kaxil kaxil added this to the Airflow 3.0.0 milestone Oct 15, 2024
@dosubot dosubot bot added the area:CLI label Oct 15, 2024
@vatsrahul1001 vatsrahul1001 self-assigned this Dec 13, 2024
@vatsrahul1001
Copy link
Collaborator

The current GunicornMonitor provides the following capabilities:

  1. Automatic worker restarts if workers crash or hang:
    Ensures that if a worker crashes or becomes unresponsive, it is automatically restarted.

  2. Graceful worker scaling and reloads:
    This allows for addition and removal of workers and reloads workers gracefully when needed.

  3. Timeout management for unresponsive workers:
    Gunicorn monitors workers for unresponsiveness and can terminate them if they exceed a set timeout, preventing hangs.

If we switch to uvicorn.run(), we would lose these features since uvicorn.run() lacks built-in process management. Specifically:

If a worker dies, there's no master process to restart it.
There will be no automatic scaling of workers, and no handling of worker timeouts or periodic restarts.
To replicate this functionality, we would need an external process manager like systemd or supervisord, which adds additional complexity and overhead.

cc: @kaxil @ashb

@ashb
Copy link
Member

ashb commented Dec 17, 2024

For 2: https://docs.gunicorn.org/en/stable/signals.html

TTIN: Increment the number of processes by one
TTOU: Decrement the number of processes by one

If a worker dies, there's no master process to restart it

Doesn't Gunicorn do that itself? https://docs.gunicorn.org/en/stable/design.html#master

The master process is a simple loop that listens for various process signals and reacts accordingly. It manages the list of running workers by listening for signals like TTIN, TTOU, and CHLD. TTIN and TTOU tell the master to increase or decrease the number of running workers. CHLD indicates that a child process has terminated, in this case the master process automatically restarts the failed worker.

So it's only the case of "worker hang" that might not be there anymore.Let me think

@ashb
Copy link
Member

ashb commented Dec 17, 2024

@potiuk
Copy link
Member

potiuk commented Dec 17, 2024

Just one comment here -> I've heard (but it's mostly through grapevine) that for quite a long time, uvicorn has the capability (and it's more and more recommended in production) - to manage multiple processes and handle sync requests directly - on their own and there is basically no need to use gunicorn at all.

Again it's more of "overheard" thing but looking at https://www.uvicorn.org/deployment/#using-a-process-manager , maybe that's what we are looking for? (or maybe I misunderstood what we want to do, just wanted to mention that gunicorn might not be needed at all maybe)

@kaxil kaxil changed the title Investigate if we need gunicornmontor & replace it with uvicorn.run() if we don't Investigate if we need gunicornmontor & replace it with uvicorn.run() Dec 17, 2024
@kaxil kaxil changed the title Investigate if we need gunicornmontor & replace it with uvicorn.run() Investigate if we can replace gunicornmontor with uvicorn.run() Dec 17, 2024
@vatsrahul1001
Copy link
Collaborator

To perform the comparison. I replaced Gunicorn code in else block with below uvicorn.run command

  uvicorn.run("airflow.api_fastapi.main:app", host=args.hostname, port=args.port, workers=num_workers,
                    timeout_keep_alive=worker_timeout, timeout_graceful_shutdown=worker_timeout, ssl_keyfile=ssl_key,
                    ssl_certfile=ssl_cert, access_log=access_logfile)

I used locust for performance testing with below configuration

These are the stats comparing uvicorn.run() with Gunicorn + GunicornMonitor


Comparison: Uvicorn vs. Gunicorn Performance

Request Statistics

Metric Uvicorn Gunicorn
Total Requests 14,714 14,726
Total Failures 0 13
Average Response Time 12.05 ms 13.46 ms
Min Response Time 7 ms 1 ms
Max Response Time 195 ms 216 ms
Average Size (bytes) 4,608 4,603.93
Requests Per Second (RPS) 49.05 49.09
Failures Per Second 0 0.04

Observations

  1. Response Times:

    • Uvicorn demonstrates slightly lower average and maximum response times compared to Gunicorn.
    • Percentile analysis shows Uvicorn's response times are more consistent, with fewer extreme values at higher percentiles.
  2. Failures:

    • Uvicorn had no failures, whereas Gunicorn recorded 13 failures caused by RemoteDisconnected errors. This could indicate potential issues in connection handling under load.
  3. Performance Consistency:

    • Uvicorn offers better consistency and reliability based on the above data.

@potiuk
Copy link
Member

potiuk commented Dec 19, 2024

Nice!.

@vatsrahul1001
Copy link
Collaborator

@github-project-automation github-project-automation bot moved this from In Progress to Done in AIP-84 MODERN REST API Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants