Add start_profile/stop_profile implementation #48

amy-why-3459 · 2025-11-29T03:25:24Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring
Test improvements
CI/CD improvements

Related Issues

Changes Made

Testing

Existing tests pass
New tests added (if applicable)
Manual testing performed

Test Coverage

INFO 11-29 11:22:45 [config/model.py:1510] Using max model len 128000
INFO 11-29 11:22:45 [proxy.py:304] Connected to worker tcp://[2071:192:168:2::181]:39000 success
INFO 11-29 11:22:45 [proxy.py:304] Connected to worker tcp://[2071:192:168:2::181]:40000 success
INFO 11-29 11:23:43 [proxy.py:806] Profiling started successfully
Request(0) generated_text: The
Request(0) generated_text: The text
Request(0) generated_text: The text in
Request(0) generated_text: The text in the
Request(0) generated_text: The text in the illustration
Request(0) generated_text: The text in the illustration reads
Request(0) generated_text: The text in the illustration reads:

Documentation

Documentation updated (if needed)
Code comments added/updated
API documentation updated (if applicable)

Checklist

Screenshots/Output

Additional Notes

Reviewer Checklist

Copilot

Pull request overview

This PR implements start_profile and stop_profile functionality to enable profiling of the inference engine. The implementation follows the existing request/response pattern used for other operations like heartbeat and metrics.

Adds new protocol types (START_PROFILE, STOP_PROFILE) for request/response communication
Implements profile handlers in the disagg worker to call engine profiling methods
Adds proxy methods to send profile requests to specific workers and handle responses

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 14 comments.

File	Description
lm_service/protocol/protocol.py	Adds START_PROFILE and STOP_PROFILE constants and ProfileRequest/ProfileResponse message structures
lm_service/workers/vllm/disagg_worker.py	Refactors request handling to use a decoder map and adds profile request handlers
lm_service/apis/vllm/proxy.py	Refactors response handling to use a decoder map, adds start_profile/stop_profile methods, and improves worker registration

Comments suppressed due to low confidence (1)

lm_service/apis/vllm/proxy.py:590

The check on line 584-590 excludes RequestType.EXIT and RequestType.REGISTER from the 'request may have been aborted' warning, but does not include the new RequestType.START_PROFILE and RequestType.STOP_PROFILE. If these profile responses arrive for requests that are no longer in the queue, they will generate spurious warnings. Add RequestType.START_PROFILE and RequestType.STOP_PROFILE to the exclusion list on lines 585-590, or add ResponseType.START_PROFILE and ResponseType.STOP_PROFILE if those are the actual response types used.

                if resp.request_id not in self.queues:
                    if resp_type not in (
                        ResponseType.HEARTBEAT,
                        ResponseType.METRICS,
                        RequestType.EXIT,
                        RequestType.REGISTER,
                    ):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lm_service/protocol/protocol.py

Copilot · 2025-11-30T00:13:19Z

lm_service/workers/vllm/disagg_worker.py

+        if req_type in decoder_map:
+            req = decoder_map[req_type]["decoder"].decode(req_data)
        else:
-            raise Exception(f"Unknown Request Type: {req_type.decode()}.")
+            raise Exception("Unknown Request Type.")
+        if req_type == RequestType.ENCODE or req_type == RequestType.PREFILL:
+            req.sampling_params.max_tokens = 1
+        await decoder_map[req_type]["handler"](req)


If req_type is not in decoder_map, the code raises an exception on line 289, but then line 292 tries to access decoder_map[req_type] which would have already raised. However, the logic issue is that lines 290-292 will fail if req_type is not in decoder_map. The error handling should include an early return or the subsequent logic should be inside the if block. Move lines 290-292 inside the if block to ensure they only execute when req_type is valid.

lm_service/workers/vllm/disagg_worker.py

lm_service/apis/vllm/proxy.py

Copilot · 2025-11-30T00:13:20Z

lm_service/apis/vllm/proxy.py

+            cluster = self.instance_clusters.get(server_type, None)
+            if cluster is None:
+                logger.error(
+                    f"_worker_register_handler fail, unknown server type {server_type}"
+                )
+                return


Lines 488-494 check if server_type is in self.active_types, then immediately check if the cluster exists. However, if server_type is in self.active_types, the cluster should have been initialized in init (line 189). This creates defensive code that may hide initialization bugs. Consider either: (1) removing the redundant cluster None check since active_types guarantees initialization, or (2) removing the active_types check and only checking cluster existence.

Suggested change

cluster = self.instance_clusters.get(server_type, None)

if cluster is None:

logger.error(

f"_worker_register_handler fail, unknown server type {server_type}"

)

return

cluster = self.instance_clusters[server_type]

Copilot · 2025-11-30T00:13:21Z

lm_service/apis/vllm/proxy.py

+            ):
+                logger.info("Profiling stopped successfully")
+            else:
+                logger.error(f"Failed to stop profiling: {response}")


Lines 830-837 have the same issue as start_profile: errors are logged but not raised, and the method returns None in both success and failure cases. This makes it impossible for callers to determine if profiling actually stopped. Consider raising an exception when the response is not a successful ProfileResponse, or return a boolean to indicate success/failure.

Suggested change

logger.error(f"Failed to stop profiling: {response}")

logger.error(f"Failed to stop profiling: {response}")

raise RuntimeError(f"Failed to stop profiling: {response}")

Copilot · 2025-11-30T00:13:21Z

lm_service/workers/vllm/disagg_worker.py

+            )
+        await self._handle_response(req, msg)
+
+    async def _stop_profile_handler(self, req: ProfileRequest):


The _stop_profile_handler method (lines 496-499) is missing a docstring while the _start_profile_handler method has one (line 466). Add a docstring to _stop_profile_handler for consistency: """Stop profiling on the engine."""

Suggested change

async def _stop_profile_handler(self, req: ProfileRequest):

async def _stop_profile_handler(self, req: ProfileRequest):

"""Stop profiling on the engine."""

Copilot · 2025-11-30T00:13:22Z

lm_service/apis/vllm/proxy.py

+        request = ProfileRequest(
+            request_id=request_id, proxy_addr=self.proxy_addr
+        )
+        q: asyncio.Queue = asyncio.Queue()


The queue created on line 791 has no timeout or size limit. If the worker never responds, the await q.get() on line 801 will hang indefinitely. Consider using asyncio.wait_for() with a timeout when getting from the queue, similar to how other async operations in the codebase might handle timeouts.

Copilot · 2025-11-30T00:13:22Z

lm_service/apis/vllm/proxy.py

+        request = ProfileRequest(
+            request_id=request_id, proxy_addr=self.proxy_addr
+        )
+        q: asyncio.Queue = asyncio.Queue()


The queue created on line 819 has the same issue as in start_profile: no timeout or size limit. If the worker never responds, the await q.get() on line 830 will hang indefinitely. Consider using asyncio.wait_for() with a timeout when getting from the queue.

Copilot · 2025-11-30T00:13:22Z

lm_service/apis/vllm/proxy.py

+        q: asyncio.Queue = asyncio.Queue()
+        self.queues[request_id] = q
+
+        try:
+            payload = self.encoder.encode(request)
+            msg = (RequestType.START_PROFILE, payload)
+            socket = await self._get_socket_and_server_types_from_addr(
+                addr, server_type
+            )
+            await socket.send_multipart(msg, copy=False)
+            response = await q.get()


The new start_profile and stop_profile methods (lines 781-841) lack test coverage. Given that the repository has tests for other proxy functionality (tests/test_proxy.py), these new methods should also have tests to verify correct behavior, error handling, and response processing.

Signed-off-by: amy-why-3459 <[email protected]>

github-actions bot added workers api vllm protocol labels Nov 29, 2025

amy-why-3459 force-pushed the profiling branch from cdf02b7 to df76a69 Compare November 29, 2025 07:10

wuhang2014 requested a review from Copilot November 30, 2025 00:09

Copilot started reviewing on behalf of wuhang2014 November 30, 2025 00:09 View session

Copilot finished reviewing on behalf of wuhang2014 November 30, 2025 00:11

Copilot AI reviewed Nov 30, 2025

View reviewed changes

amy-why-3459 force-pushed the profiling branch from df76a69 to a198f27 Compare December 6, 2025 06:41

Add start_profile/stop_profile implementation

2c7e2f7

Signed-off-by: amy-why-3459 <[email protected]>

amy-why-3459 force-pushed the profiling branch from a198f27 to 2c7e2f7 Compare December 6, 2025 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add start_profile/stop_profile implementation #48

Add start_profile/stop_profile implementation #48

Uh oh!

amy-why-3459 commented Nov 29, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Copilot AI Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	logger.error(f"Failed to stop profiling: {response}")
	logger.error(f"Failed to stop profiling: {response}")
	raise RuntimeError(f"Failed to stop profiling: {response}")

	async def _stop_profile_handler(self, req: ProfileRequest):
	async def _stop_profile_handler(self, req: ProfileRequest):
	"""Stop profiling on the engine."""

Add start_profile/stop_profile implementation #48

Are you sure you want to change the base?

Add start_profile/stop_profile implementation #48

Uh oh!

Conversation

amy-why-3459 commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Changes Made

Testing

Test Coverage

Documentation

Checklist

Screenshots/Output

Additional Notes

Reviewer Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

amy-why-3459 commented Nov 29, 2025 •

edited

Loading