The vLLM engine should be in its own thread that only does prefill work. checking logic should be in its own cpu thread.
The vLLM engine should be in its own thread that only does prefill work. checking logic should be in its own cpu thread.