-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Description
The current implementation of proxy.py contains a significant amount of duplicated logic across different instance types (E, P, D, PD), especially in request sending, worker scheduling, and routing.
Additionally, the lack of a unified abstraction for instance behaviors makes future extensions harder and increases maintenance cost.
This issue proposes a structural refactor to improve maintainability, reduce duplication, and support future instance-type expansion more easily.
Problem Details
1. Excessive Code Duplication
Several methods (e.g., worker handlers, request dispatching, instance initialization) share similar logic but are implemented separately for each instance type.
This leads to:
-
Harder debugging (fix in one place but forget another)
-
Higher risk of inconsistent behavior
-
More effort required when adding new instance types
2. No Encapsulation for Instance Behaviors
Instance-specific logic is spread across the file:
-
How to send requests
-
How to handle streaming vs non-streaming
-
How to select routers and manage queues
Because there is no unified abstraction layer, adding a new instance type (e.g., a future "X_INSTANCE") requires manually modifying multiple sections of proxy.py.
This architecture reduces flexibility and slows development.
Proposed Solution
1. Introduce a Unified BaseInstanceHandler Class
Define a base class that encapsulates shared functionalities:
-
Request sending (sync/async)
-
Worker loop logic
-
Queue handling
-
Router access
-
Expect-stream flag
Each instance type will inherit and override only the parts that differ.
Benefits:
-
Eliminates repetitive logic
-
Cleaner separation of concerns
-
Instance-specific customization becomes easier
2. Replace Conditional Logic with Polymorphism
Current code uses a lot of:
if server_type == ServerType.E_INSTANCE: ... elif server_type == ServerType.P_INSTANCE: ...
After refactoring:
handler = self.handlers[server_type] await handler.run(...)
This aligns with open-closed principle (OCP):
-
Adding new instance types requires adding a new handler class
-
No more modifying core proxy code
3. Centralize Instance Initialization
Instead of scattered initialization based on server_type, provide:
-
A factory class (InstanceHandlerFactory)
-
A registry mapping ServerType → handler class
Example:
handlers = { ServerType.E_INSTANCE: EncodeHandler, ServerType.P_INSTANCE: PrefillHandler, ServerType.D_INSTANCE: DecodeHandler, ServerType.PD_INSTANCE: PDHandler, }
4. Optional: Split Large proxy.py into Smaller Modules
If the file is too large, consider splitting:
-
handlers/base.py -
handlers/encode.py -
handlers/prefill.py -
handlers/decode.py -
handlers/pd.py -
proxy.py→ orchestrator only
Expected Outcomes
-
Reduce code duplication by 40–70%
-
Cleaner, easier-to-maintain architecture
-
Adding new instance types becomes trivial
-
Stronger abstraction boundaries
-
More readable and testable code structure