Reduce time it takes to import SGLang #12510

raayandhar · 2025-11-02T04:04:28Z

Motivation

We notice that the time to import things in SGLang takes a lot of time (#10492). I have been looking into what is taking up a lot of time and if there are simple ways to help reduce this import time. From the original issue, we want to reduce:

time python -c "from sglang.srt.managers.scheduler import Scheduler"

which is what I have been focusing my efforts on. However, I think there are things we can do to reduce time for other imports. This is more of a V1 to get community feedback from experts.

Modifications

There are some heavy imports. For example, the quantization methods import at the module level is heavy. Moving some imports to the function level (only time it is used), we can reduce module import time. However, I can see how this can easily be an antipattern. In fact, it can hurt performance if we have a function that is used a lot that we have an import in. I tried to only do this in functions that we only expect to run once or a small number of times. However, I can understand the argument against this kind of code. I also don't think all the changes to hf_transformer_utils.py help so I will be taken a deeper look, since the changes are a bit invasive.

Accuracy Tests

These changes should not affect model outputs.

Benchmarking and Profiling

Running for i in {1..100}; do (time python -B -c "from sglang.srt.managers.scheduler import Scheduler") 2>&1 | grep "^real"; done | python calc_avg.py (calc_avg.py)

With these changes:

===========Timing Statistics============                                                                            
Number of runs: 100                                                                                                 
Mean:     8.308s                                                                                                    
Median:   8.236s                                                                                                    
Std Dev:  0.297s                                                                                                    
Min:      7.999s                                                                                                    
Max:      9.329s                                                                                                    
========================================

Compared to top-of-main:

===========Timing Statistics============                                                                            
Number of runs: 100                                                                                                 
Mean:     9.836s                                                                                                    
Median:   9.790s                                                                                                    
Std Dev:  0.332s                                                                                                    
Min:      8.655s                                                                                                    
Max:      11.801s                                                                                                   
========================================

so we have ~1.5 second improvement. Not the best, so I am going to keep working on it. I mostly targeted improving the timing in the creation of ModelConfig object. The difference so far is largely from removing the quantization import:

Without these changes import_sglang_tom.log

With these changes import_sglang_new.log

Machine:

AMD EPYC 7343 16-Core Processor
L40S GPU

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests. (N/A?)
Update documentation according to Write documentations. (N/A?)
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed. -- will try to do this, although I am GPU poor.

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar · 2025-11-02T04:04:45Z

I will continue working on this, there is more improvements to be made.

raayandhar · 2025-11-02T04:38:55Z

python/sglang/srt/utils/hf_transformers_utils.py

-for name, cls in _CONFIG_REGISTRY.items():
-    with contextlib.suppress(ValueError):
-        AutoConfig.register(name, cls)
+def _register_custom_configs():


I think that a lot of these changes in this file do not really improve the times, and they are overly invasive. However, the transformers related code is a huge time increase. I think they may be impossible to remove since we init many transformer related objects in the scheduler.

raayandhar added 2 commits November 1, 2025 19:40

efforts done towards reducing import time

6ebf331

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

remove comment

aff6929

Signed-off-by: Raayan Dhar [email protected] <[email protected]>

raayandhar commented Nov 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce time it takes to import SGLang #12510

Reduce time it takes to import SGLang #12510

raayandhar commented Nov 2, 2025

Uh oh!

raayandhar commented Nov 2, 2025

Uh oh!

raayandhar Nov 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Reduce time it takes to import SGLang #12510

Are you sure you want to change the base?

Reduce time it takes to import SGLang #12510

Conversation

raayandhar commented Nov 2, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

raayandhar commented Nov 2, 2025

Uh oh!

raayandhar Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant