Skip to content

Conversation

@amy-why-3459
Copy link
Collaborator

@amy-why-3459 amy-why-3459 commented Nov 29, 2025

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Test improvements
  • CI/CD improvements

Related Issues

Changes Made

Testing

  • Existing tests pass
  • New tests added (if applicable)
  • Manual testing performed

Test Coverage

INFO 11-29 11:22:45 [config/model.py:1510] Using max model len 128000
INFO 11-29 11:22:45 [proxy.py:304] Connected to worker tcp://[2071:192:168:2::181]:39000 success
INFO 11-29 11:22:45 [proxy.py:304] Connected to worker tcp://[2071:192:168:2::181]:40000 success
INFO 11-29 11:23:43 [proxy.py:806] Profiling started successfully
Request(0) generated_text: The
Request(0) generated_text: The text
Request(0) generated_text: The text in
Request(0) generated_text: The text in the
Request(0) generated_text: The text in the illustration
Request(0) generated_text: The text in the illustration reads
Request(0) generated_text: The text in the illustration reads:

Documentation

  • Documentation updated (if needed)
  • Code comments added/updated
  • API documentation updated (if applicable)

Checklist

  • I have read the CONTRIBUTING guidelines
  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published
  • I have signed off my commits (DCO)

Screenshots/Output

Additional Notes

Reviewer Checklist

  • Code quality and style
  • Test coverage adequate
  • Documentation updated
  • Performance considerations reviewed
  • Security implications considered
  • Breaking changes documented

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements start_profile and stop_profile functionality to enable profiling of the inference engine. The implementation follows the existing request/response pattern used for other operations like heartbeat and metrics.

  • Adds new protocol types (START_PROFILE, STOP_PROFILE) for request/response communication
  • Implements profile handlers in the disagg worker to call engine profiling methods
  • Adds proxy methods to send profile requests to specific workers and handle responses

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 14 comments.

File Description
lm_service/protocol/protocol.py Adds START_PROFILE and STOP_PROFILE constants and ProfileRequest/ProfileResponse message structures
lm_service/workers/vllm/disagg_worker.py Refactors request handling to use a decoder map and adds profile request handlers
lm_service/apis/vllm/proxy.py Refactors response handling to use a decoder map, adds start_profile/stop_profile methods, and improves worker registration
Comments suppressed due to low confidence (1)

lm_service/apis/vllm/proxy.py:590

  • The check on line 584-590 excludes RequestType.EXIT and RequestType.REGISTER from the 'request may have been aborted' warning, but does not include the new RequestType.START_PROFILE and RequestType.STOP_PROFILE. If these profile responses arrive for requests that are no longer in the queue, they will generate spurious warnings. Add RequestType.START_PROFILE and RequestType.STOP_PROFILE to the exclusion list on lines 585-590, or add ResponseType.START_PROFILE and ResponseType.STOP_PROFILE if those are the actual response types used.
                if resp.request_id not in self.queues:
                    if resp_type not in (
                        ResponseType.HEARTBEAT,
                        ResponseType.METRICS,
                        RequestType.EXIT,
                        RequestType.REGISTER,
                    ):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 286 to 292
if req_type in decoder_map:
req = decoder_map[req_type]["decoder"].decode(req_data)
else:
raise Exception(f"Unknown Request Type: {req_type.decode()}.")
raise Exception("Unknown Request Type.")
if req_type == RequestType.ENCODE or req_type == RequestType.PREFILL:
req.sampling_params.max_tokens = 1
await decoder_map[req_type]["handler"](req)
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If req_type is not in decoder_map, the code raises an exception on line 289, but then line 292 tries to access decoder_map[req_type] which would have already raised. However, the logic issue is that lines 290-292 will fail if req_type is not in decoder_map. The error handling should include an early return or the subsequent logic should be inside the if block. Move lines 290-292 inside the if block to ensure they only execute when req_type is valid.

Copilot uses AI. Check for mistakes.
Comment on lines +489 to +494
cluster = self.instance_clusters.get(server_type, None)
if cluster is None:
logger.error(
f"_worker_register_handler fail, unknown server type {server_type}"
)
return
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 488-494 check if server_type is in self.active_types, then immediately check if the cluster exists. However, if server_type is in self.active_types, the cluster should have been initialized in init (line 189). This creates defensive code that may hide initialization bugs. Consider either: (1) removing the redundant cluster None check since active_types guarantees initialization, or (2) removing the active_types check and only checking cluster existence.

Suggested change
cluster = self.instance_clusters.get(server_type, None)
if cluster is None:
logger.error(
f"_worker_register_handler fail, unknown server type {server_type}"
)
return
cluster = self.instance_clusters[server_type]

Copilot uses AI. Check for mistakes.
):
logger.info("Profiling stopped successfully")
else:
logger.error(f"Failed to stop profiling: {response}")
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 830-837 have the same issue as start_profile: errors are logged but not raised, and the method returns None in both success and failure cases. This makes it impossible for callers to determine if profiling actually stopped. Consider raising an exception when the response is not a successful ProfileResponse, or return a boolean to indicate success/failure.

Suggested change
logger.error(f"Failed to stop profiling: {response}")
logger.error(f"Failed to stop profiling: {response}")
raise RuntimeError(f"Failed to stop profiling: {response}")

Copilot uses AI. Check for mistakes.
)
await self._handle_response(req, msg)

async def _stop_profile_handler(self, req: ProfileRequest):
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _stop_profile_handler method (lines 496-499) is missing a docstring while the _start_profile_handler method has one (line 466). Add a docstring to _stop_profile_handler for consistency: """Stop profiling on the engine."""

Suggested change
async def _stop_profile_handler(self, req: ProfileRequest):
async def _stop_profile_handler(self, req: ProfileRequest):
"""Stop profiling on the engine."""

Copilot uses AI. Check for mistakes.
request = ProfileRequest(
request_id=request_id, proxy_addr=self.proxy_addr
)
q: asyncio.Queue = asyncio.Queue()
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue created on line 791 has no timeout or size limit. If the worker never responds, the await q.get() on line 801 will hang indefinitely. Consider using asyncio.wait_for() with a timeout when getting from the queue, similar to how other async operations in the codebase might handle timeouts.

Copilot uses AI. Check for mistakes.
request = ProfileRequest(
request_id=request_id, proxy_addr=self.proxy_addr
)
q: asyncio.Queue = asyncio.Queue()
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue created on line 819 has the same issue as in start_profile: no timeout or size limit. If the worker never responds, the await q.get() on line 830 will hang indefinitely. Consider using asyncio.wait_for() with a timeout when getting from the queue.

Copilot uses AI. Check for mistakes.
Comment on lines +791 to +807
q: asyncio.Queue = asyncio.Queue()
self.queues[request_id] = q

try:
payload = self.encoder.encode(request)
msg = (RequestType.START_PROFILE, payload)
socket = await self._get_socket_and_server_types_from_addr(
addr, server_type
)
await socket.send_multipart(msg, copy=False)
response = await q.get()
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new start_profile and stop_profile methods (lines 781-841) lack test coverage. Given that the repository has tests for other proxy functionality (tests/test_proxy.py), these new methods should also have tests to verify correct behavior, error handling, and response processing.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant