forked from mlc-ai/mlc-llm
-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
bugSomething isn't workingSomething isn't working
Description
The default run for serve/tests/test_engine.py
first adds 4 requests and then start the engine.
I expected this would form single prefill batch with 4 requests.
However, it shows non-deterministic behavior, sometimes it forms two prefill batches of 2 requests each, sometimes it forms two prefill batches of 1/3 requests each.
Debug log does not provide any useful information.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working