Skip to content

[RFC] Refactor proxy.py to Reduce Duplication and Improve Extensibility #38

@JohnLiu97Huawei

Description

@JohnLiu97Huawei

Description

The current implementation of proxy.py contains a significant amount of duplicated logic across different instance types (E, P, D, PD), especially in request sending, worker scheduling, and routing.
Additionally, the lack of a unified abstraction for instance behaviors makes future extensions harder and increases maintenance cost.

This issue proposes a structural refactor to improve maintainability, reduce duplication, and support future instance-type expansion more easily.


Problem Details

1. Excessive Code Duplication

Several methods (e.g., worker handlers, request dispatching, instance initialization) share similar logic but are implemented separately for each instance type.
This leads to:

  • Harder debugging (fix in one place but forget another)

  • Higher risk of inconsistent behavior

  • More effort required when adding new instance types

2. No Encapsulation for Instance Behaviors

Instance-specific logic is spread across the file:

  • How to send requests

  • How to handle streaming vs non-streaming

  • How to select routers and manage queues

Because there is no unified abstraction layer, adding a new instance type (e.g., a future "X_INSTANCE") requires manually modifying multiple sections of proxy.py.

This architecture reduces flexibility and slows development.


Proposed Solution

1. Introduce a Unified BaseInstanceHandler Class

Define a base class that encapsulates shared functionalities:

  • Request sending (sync/async)

  • Worker loop logic

  • Queue handling

  • Router access

  • Expect-stream flag

Each instance type will inherit and override only the parts that differ.

Benefits:

  • Eliminates repetitive logic

  • Cleaner separation of concerns

  • Instance-specific customization becomes easier

2. Replace Conditional Logic with Polymorphism

Current code uses a lot of:

if server_type == ServerType.E_INSTANCE: ... elif server_type == ServerType.P_INSTANCE: ...

After refactoring:

handler = self.handlers[server_type] await handler.run(...)

This aligns with open-closed principle (OCP):

  • Adding new instance types requires adding a new handler class

  • No more modifying core proxy code

3. Centralize Instance Initialization

Instead of scattered initialization based on server_type, provide:

  • A factory class (InstanceHandlerFactory)

  • A registry mapping ServerType → handler class

Example:

handlers = { ServerType.E_INSTANCE: EncodeHandler, ServerType.P_INSTANCE: PrefillHandler, ServerType.D_INSTANCE: DecodeHandler, ServerType.PD_INSTANCE: PDHandler, }

4. Optional: Split Large proxy.py into Smaller Modules

If the file is too large, consider splitting:

  • handlers/base.py

  • handlers/encode.py

  • handlers/prefill.py

  • handlers/decode.py

  • handlers/pd.py

  • proxy.py → orchestrator only


Expected Outcomes

  • Reduce code duplication by 40–70%

  • Cleaner, easier-to-maintain architecture

  • Adding new instance types becomes trivial

  • Stronger abstraction boundaries

  • More readable and testable code structure

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions