Skip to content

Conversation

ZichengMa
Copy link

@ZichengMa ZichengMa commented Jun 6, 2025

This PR adds a component-level observability framework by embedding a lightweight HTTP server into each Dynamo service. It delivers:

  • Unified HTTP Interface: Exposes both metrics and health probes over a single port per component, simplifying deployment and operations.

  • Component-Level Metrics Migration: Currently @tedzhouhk implemented a rust frontend metrics collection. This PR may migrate the implementation. For other component, what metrics should be monitored still needs discussion.

  • Health-Check Requirements: Defines appropriate probes and checks to support Kubernetes-style deployment patterns.

  • Python & Rust Bindings: Provides APIs for registering custom health functions and configuring response thresholds programmatically.

Copy link

@itay itay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start to the doc, left some comments - I think seeing that picture of what a fully deployed graph looks like would be very helpful.


The system **MUST** include a unified HTTP endpoint infrastructure for Dynamo components to expose metrics and health check endpoints

### REQ 2 Performance Mrtics Requirements for worker nodes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the workers (i.e. the frameworks), we need to also have a requirement to allow us to grab the metrics that those frameworks natively expose. You note below the ones we want to capture ourselves from the frameworks (e.g. TTFT, etc), but the frameworks have extensive metrics that should be collectable. We need to understand how this is going to work (e.g. are they going to expose it independently, are we going bring those into our metrics, etc).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZichengMa - @alec-flowers and I have been chatting about this. Right now we have an implementation that works for VLLM but does not for SGLang. We've been thinking about standardizing on prometheus for this. Happy to iterate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the addition here to the proposal then that the worker component that wraps the framework will aggregate metrics from the framework and any other additional metrics into one endpoint? I think that makes sense in general - and we can add that here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the rust frontend's metrics are deployment-level. We also need worker-level metrics. The challenge here is that worker-level metrics may come from the backend itself instead of dynamo scripts.

- `/liveness` - Component liveness probe
- `/readiness` - Component readiness probe

### REQ 4 Core Health Check Implementation
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have a description on what is the acceptable latency range for the healthcheck endpoints, otherwise we risk more and more "checks" being stuffed into them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latency range should be handled on k8s controller side? Let controller determines what is an appropriate range?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is configurable at the cluster level and per pod.




# Architecture
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'd like to see is an overall diagram, let's make it Kubernetes-oriented, that shows a canonical graph (e.g. a disagg one) with the individual pods, and within them the components that are running, and the HTTP port/endpoints that are exposed.


The proposed solution consists of three main components:

1. **Unified HTTP Server Port**: Each Dynamo component will embed a single HTTP server that provides a unified interface for both metrics exposure and health check endpoints, eliminating the need for multiple ports or separate servers per component.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say single HTTP server, but in the case of a component (e.g. the frontend) that has its own independent HTTP server, I imagine that it would be independent of that?

Specifically, the metric/healthcheck HTTP server should ideally sit at least on an independent port (ideally bound to only 127.0.0.1) from the core server, so as to ensure that these endpoints are not exposed externally.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so long as the fabric, assuming that there is one, can access the necessary endpoints to accurate determine health and readiness state, I'm fine with this.

I do worry (because a lot of things tend be Python based) that additional endpoints could end up incurring addition processes and thereby significant, unnecessary overhead. I'd like to see a commitment to avoid these issues, even if that means the core implementation is not in Python.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an example - that because we start a second process or a second HTTP service that we're consuming more RAM/CPU? Just making sure I understand.

Copy link
Contributor

@nnshah1 nnshah1 Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the proposal here - and maybe needs to be flushed out -is that we would be reusing the rust based http server.

In terms of the frontend specifically - I think that depends a little on if we consider the frontend to be externally exposed or always behind an ingress ...

It might be simpler for us to define every dynamo component to have a single server with multiple endpoints (health/metrics/functions) - and then for a deployment the frontend would be behind an ingress to handle all auth / security related things

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an example - that because we start a second process or a second HTTP service that we're consuming more RAM/CPU? Just making sure I understand.

Every process has it's own memory footprint and any interaction between processes has to be done via shared memory (more footprint). Additionally, the OS has to manage thread and hardware congestion of multiple processes in a "fair scheduler" manner instead of the more efficient "in-process" ordering that occurs.

When it's the "right answer", multi-process is great but it's not free. When it is "just another answer" the overhead needs to balanced w/ the benefits. That's all I am saying.

The reason I mentioned Python is because Python doesn't support multi-threading, and instead resorts to multi-process (w/ shared memory, etc.) to affect parallel operations.


To be done

## Metrics Architecture
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are metrics collected from within the component, e.g. if you need to move some metrics from the Python side? If I have a custom component that has custom metrics, how do I easily get those exposed?

whoisj
whoisj previously requested changes Jun 9, 2025
Copy link

@whoisj whoisj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the PR title to something meaningful and provide a reasonable description of the PR.

@ZichengMa ZichengMa changed the title Zicheng/metric health check Introduce Unified Metrics & Health-Check HTTP Endpoints for Dynamo Components Jun 9, 2025
@whoisj whoisj dismissed their stale review June 9, 2025 18:15

Github won't allow me to resolve the request for title and description updates.


# Summary

This proposal introduces a unified HTTP endpoint infrastructure for Dynamo components, enabling comprehensive observability and monitoring capabilities at the component level. The core design centers around embedding an HTTP server within each Dynamo component to expose standardized endpoints for both metrics collection and health monitoring. This approach migrates the existing metrics monitoring system from Hongkaun's [implementation](https://github.com/ai-dynamo/dynamo/pull/1315), while simultaneously introducing robust health check mechanisms including liveness, readiness, and custom health probes. Currently, the metrics collection is implemented in the Rust frontend with Prometheus integration, but lacks a unified approach across all components so we need to migrate to a component-level HTTP endpoint approach.
Copy link

@rmccorm4 rmccorm4 Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - have we considered using dynamo runtime concepts like endpoint that is auto attached (or customizable) to every worker?

Instead of an HTTP server spawned per worker, we instead just have another endpoint with something like this:

@dynamo_endpoint
def health():
  return True

What are the pros/cons of HTTP server per worker vs other alternatives being considered here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! @tedzhouhk @ishandhanani — from what I’ve seen, our current /metrics and /health routes live on a standalone HTTP server, not as Dynamo endpoints. Do you have any sense why we didn’t fold them into the Dynamo endpoints model instead? What were the trade-offs at the time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the last discussion I was part of - we wanted to add this to the same endpoint model and enable components to specfiy that it would be visible via http - so still through the runtime

Copy link

@rmccorm4 rmccorm4 Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our current /metrics and /health routes live on a standalone HTTP server

The top level HTTP server sits with the frontend/ingress, and this forwards requests to components/workers/etc. over dynamo runtime, and can aggregate some information of it's view of the overall distributed system.

Similarly, it could query the health/status of individual workers over dynamo runtime (ex: exposed per worker as dynamo endpoints, queried over NATS for example), compared to to each worker spinning up an additional HTTP server per worker (exposed per worker as additional HTTP server per worker) and querying each of their HTTP servers - so was curious on that part, if I'm understanding correctly that the proposal is proposing the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct we could additionally do that - but a few points:

  1. I think we wanted each individual component to have a liveness / readiness - and not only the ingress - to be able to check health for individual workers / planner / etc. I'm not myself sold on either aggregate through a central http server or to distribute to each component - so as long as we can get the same information - either way is ok.

  2. there is the discussion to enable metrics via http to be able to leverage prometheus based scraping. If we move to exposing an http metrics endpoint for every worker - I think then I would like to reuse the http server for health / status as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To echo what @nnshah1 is saying, we need each running pod and container within (this is why it's critical we get the diagram) to be able to independently report out its current health, so that Kubernetes (or nearly any other orchestrator) can manage its lifecycle appropriately. It then benefits us on metrics as well, but that's separate.

Copy link
Contributor

@nnshah1 nnshah1 Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To close the loop here and make it explicit the idea was to extend the dynamo_endpoint syntax to include a transport field and allow endpoints to be exposed as either nats targets or http targets:

@dynamo_endpoint(transport='http')
def health():
  return True

In the future this could also be used to expose other endpoints such as open ai compatible endpoints from workers directly as well -

Comment on lines +127 to +128
# also exposes http endpoint which will be queried from k8s
@liveness
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we plan to expose health/liveness/readiness for rust workers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding here - is that this is for customization only - so for rust workers the same would available -

Copy link

@whoisj whoisj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. Requested a single change and provided the updated content.

Co-authored-by: Hongkuan Zhou <[email protected]>
Signed-off-by: Neelay Shah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants