Skip to content

feat: vLLM backend #2010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 93 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
93 commits
Select commit Hold shift + click to select a range
60b13bb
wip: download uv
gau-nernst Feb 14, 2025
3ddce8c
Merge branch 'dev' into thien/python_engine
gau-nernst Feb 14, 2025
f9817c8
fix: has_value -> has_error
gau-nernst Feb 14, 2025
2dbc296
move uv stuff to python_engine. use uv to start process
gau-nernst Feb 18, 2025
eec24bd
redirect stdout/stderr
gau-nernst Feb 18, 2025
26fdbd3
simplify code
gau-nernst Feb 18, 2025
3ba7994
rename python engine interface
gau-nernst Feb 19, 2025
5e7125f
use PythonEngineI
gau-nernst Feb 19, 2025
c5da0ee
more checks to match all EngineV variants
gau-nernst Feb 19, 2025
3c097fb
improve Python load model
gau-nernst Feb 19, 2025
84db8b0
consolidate process-related functions
gau-nernst Feb 19, 2025
8ee815c
update PythonModelConfig. add UnloadModel
gau-nernst Feb 19, 2025
29f5344
implement PythonEngine::GetModels
gau-nernst Feb 19, 2025
75ce355
Merge branch 'dev' into thien/python_engine
gau-nernst Feb 19, 2025
7949dcc
implement getModelStatus. add some notes
gau-nernst Feb 19, 2025
e2f0323
add router for python
gau-nernst Feb 19, 2025
607d2cb
call PythonEngine destructor
gau-nernst Feb 19, 2025
f58b773
remove unused method
gau-nernst Feb 19, 2025
bf23c9f
remove unnecessary headers
gau-nernst Feb 19, 2025
d7818d5
Merge branch 'dev' into thien/python_engine
gau-nernst Feb 19, 2025
8ebee7c
remove unused stuff
gau-nernst Feb 20, 2025
8f36adc
download uv directly from github release
gau-nernst Feb 20, 2025
5ebfbb7
check for entrypoint
gau-nernst Feb 20, 2025
5d310d1
only record model size for llama.cpp
gau-nernst Feb 20, 2025
c4c622c
don't include headers
gau-nernst Feb 20, 2025
fc0369c
Merge branch 'dev' into thien/python_engine
gau-nernst Feb 21, 2025
6b59878
don't use std::optional to support < c++17
gau-nernst Feb 21, 2025
250a2ac
fix stringstream usage
gau-nernst Feb 21, 2025
bb38a56
define pid_t for windows
gau-nernst Feb 21, 2025
723c5db
explicit call .string() on filesystem::path to support windows
gau-nernst Feb 21, 2025
26ec20a
include extra_args in entrypoint
gau-nernst Feb 21, 2025
376deeb
add python engine install test
gau-nernst Feb 21, 2025
a9ed820
add start time
gau-nernst Feb 21, 2025
db82134
add back python engine to default supported engine so that cortex eng…
gau-nernst Feb 21, 2025
64f5451
Merge branch 'dev' into thien/python_engine
gau-nernst Feb 25, 2025
79464a2
format
gau-nernst Feb 25, 2025
1768826
run uv sync after model download
gau-nernst Feb 26, 2025
7627eac
download CUDA for python engine
gau-nernst Feb 26, 2025
06503c0
add .exe for windows
gau-nernst Feb 26, 2025
176f878
destroy file action in posix
gau-nernst Feb 26, 2025
48f50f5
Merge branch 'dev' into thien/python_engine
gau-nernst Feb 27, 2025
f7bddc2
revert name change to avoid conflict
gau-nernst Feb 27, 2025
728e7eb
check for NULL before destroy file action
gau-nernst Feb 27, 2025
560b9fe
fix windows
gau-nernst Feb 27, 2025
f481c2f
fix windows
gau-nernst Feb 27, 2025
48e2015
fix windows subprocess
gau-nernst Feb 27, 2025
f02fc93
update test
gau-nernst Feb 27, 2025
c104d77
Merge branch 'dev' into thien/python_engine
gau-nernst Feb 27, 2025
9918672
more robust checks and cleanup
gau-nernst Feb 28, 2025
99a0035
support engines uninstall
gau-nernst Mar 3, 2025
b96fd69
follow reverse proxy example
gau-nernst Mar 3, 2025
62da415
Merge branch 'dev' into thien/python_engine
gau-nernst Mar 3, 2025
e2e2ccc
update uv to 0.6.3
gau-nernst Mar 3, 2025
57c30d3
support engines list
gau-nernst Mar 3, 2025
408d66b
Merge branch 'dev' into thien/python_engine
gau-nernst Mar 4, 2025
5dbb5c5
Merge branch 'dev' into thien/python_engine
gau-nernst Mar 12, 2025
4a4cd1d
Merge branch 'dev' into thien/python_engine
gau-nernst Mar 13, 2025
49df6af
remove checks against supportedEngines
gau-nernst Mar 17, 2025
f1dcdde
remove supportedEngines check for more commands
gau-nernst Mar 17, 2025
1a7576b
Merge branch 'dev' into thien/cli_engines_install
gau-nernst Mar 17, 2025
64124b3
Merge branch 'thien/cli_engines_install' into thien/python_engine
gau-nernst Mar 17, 2025
f030615
Merge branch 'dev' into thien/python_engine
gau-nernst Mar 17, 2025
13652ca
init vllm engine
gau-nernst Mar 17, 2025
4d13014
fix issues with progress streaming
gau-nernst Mar 17, 2025
5c451d8
Merge branch 'dev' into thien/python_engine
gau-nernst Mar 18, 2025
591d461
support download HF model
gau-nernst Mar 18, 2025
c3d41bf
use / for HF model
gau-nernst Mar 18, 2025
dc42ddd
fix thread-unsafe
gau-nernst Mar 18, 2025
13d9e3f
Merge branch 'dev' into thien/python_engine
gau-nernst Mar 18, 2025
70151e2
Merge branch 'dev' into thien/python_engine
gau-nernst Mar 19, 2025
73fe3e5
remove methods
gau-nernst Mar 19, 2025
7bf287d
remove old remnants
gau-nernst Mar 19, 2025
2a2b607
support models list. add --relocatable for venv
gau-nernst Mar 19, 2025
fffc686
preparation works for start model
gau-nernst Mar 19, 2025
cea8020
add sync download util. add vLLM version config. some boilerplate cod…
gau-nernst Mar 19, 2025
86d4c01
list engines
gau-nernst Mar 19, 2025
ec8b36d
load and unload model
gau-nernst Mar 19, 2025
9226110
retrieve cortex port from yaml file
gau-nernst Mar 19, 2025
eeccd3a
add env vars support. log stdout and stderr
gau-nernst Mar 20, 2025
6fe7ae8
add GetModelStatus and GetModels
gau-nernst Mar 20, 2025
074a04a
fix typo
gau-nernst Mar 20, 2025
cd55d64
Merge branch 'dev' into thien/python_engine
gau-nernst Mar 21, 2025
368a4f3
add non-stream chat completions
gau-nernst Mar 21, 2025
c0e0fca
Merge pull request #2186 from menloresearch/s/chore/sync-dev
vansangpfiev Mar 27, 2025
e141891
Merge branch 'main' into thien/python_engine
gau-nernst Apr 1, 2025
807b201
add uninstall cmd
gau-nernst Apr 1, 2025
d38eca8
support streaming
gau-nernst Apr 1, 2025
7e002cd
fix cortex run
gau-nernst Apr 1, 2025
1ebbbdb
wait for vLLM server to be up
gau-nernst Apr 2, 2025
b5d8315
use health check for some stuff
gau-nernst Apr 2, 2025
5feda51
add some notes. support embeddings. support some extra vLLM args
gau-nernst Apr 2, 2025
5eea345
remove old tests. some chores
gau-nernst Apr 2, 2025
2bde26a
remove unused function
gau-nernst Apr 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,8 @@ add_executable(${TARGET_NAME} main.cc
${CMAKE_CURRENT_SOURCE_DIR}/utils/file_logger.cc

${CMAKE_CURRENT_SOURCE_DIR}/extensions/template_renderer.cc
${CMAKE_CURRENT_SOURCE_DIR}/extensions/python-engines/python_utils.cc
${CMAKE_CURRENT_SOURCE_DIR}/extensions/python-engines/vllm_engine.cc

${CMAKE_CURRENT_SOURCE_DIR}/utils/dylib_path_manager.cc
${CMAKE_CURRENT_SOURCE_DIR}/utils/process/utils.cc
Expand Down
2 changes: 2 additions & 0 deletions engine/cli/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ add_executable(${TARGET_NAME} main.cc
${CMAKE_CURRENT_SOURCE_DIR}/../services/database_service.cc
${CMAKE_CURRENT_SOURCE_DIR}/../extensions/remote-engine/remote_engine.cc

${CMAKE_CURRENT_SOURCE_DIR}/../extensions/python-engines/python_utils.cc
${CMAKE_CURRENT_SOURCE_DIR}/../extensions/python-engines/vllm_engine.cc
${CMAKE_CURRENT_SOURCE_DIR}/../extensions/template_renderer.cc

${CMAKE_CURRENT_SOURCE_DIR}/utils/easywsclient.cc
Expand Down
6 changes: 5 additions & 1 deletion engine/cli/commands/chat_completion_cmd.cc
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,11 @@ void ChatCompletionCmd::Exec(const std::string& host, int port,
new_data["content"] = user_input;
histories_.push_back(std::move(new_data));

Json::Value json_data = mc.ToJson();
// vLLM doesn't support params used model config
Json::Value json_data;
if (mc.engine != kVllmEngine) {
json_data = mc.ToJson();
}
json_data["engine"] = mc.engine;

Json::Value msgs_array(Json::arrayValue);
Expand Down
41 changes: 27 additions & 14 deletions engine/cli/commands/engine_install_cmd.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@
#include "utils/string_utils.h"

namespace commands {

// NOTE: should have a single source of truth between CLI and server
static bool NeedCudaDownload(const std::string& engine) {
return !system_info_utils::GetDriverAndCudaVersion().second.empty() &&
engine == kLlamaRepo;
}

bool EngineInstallCmd::Exec(const std::string& engine,
const std::string& version,
const std::string& src) {
Expand Down Expand Up @@ -35,15 +42,18 @@ bool EngineInstallCmd::Exec(const std::string& engine,
if (show_menu_) {
DownloadProgress dp;
dp.Connect(host_, port_);
bool need_cuda_download = NeedCudaDownload(engine);
// engine can be small, so need to start ws first
auto dp_res = std::async(std::launch::deferred, [&dp] {
bool need_cuda_download =
!system_info_utils::GetDriverAndCudaVersion().second.empty();
if (need_cuda_download) {
auto dp_res = std::async(std::launch::deferred, [&dp, need_cuda_download, engine] {
// if (need_cuda_download) {
// return dp.Handle({DownloadType::Engine, DownloadType::CudaToolkit});
// } else {
// return dp.Handle({DownloadType::Engine});
// }
if (engine == kLlamaRepo)
return dp.Handle({DownloadType::Engine, DownloadType::CudaToolkit});
} else {
return dp.Handle({DownloadType::Engine});
}
else
return dp.Handle({});
});

auto releases_url = url_parser::Url{
Expand Down Expand Up @@ -151,15 +161,18 @@ bool EngineInstallCmd::Exec(const std::string& engine,
// default
DownloadProgress dp;
dp.Connect(host_, port_);
bool need_cuda_download = NeedCudaDownload(engine);
// engine can be small, so need to start ws first
auto dp_res = std::async(std::launch::deferred, [&dp] {
bool need_cuda_download =
!system_info_utils::GetDriverAndCudaVersion().second.empty();
if (need_cuda_download) {
auto dp_res = std::async(std::launch::deferred, [&dp, need_cuda_download, engine] {
// if (need_cuda_download) {
// return dp.Handle({DownloadType::Engine, DownloadType::CudaToolkit});
// } else {
// return dp.Handle({DownloadType::Engine});
// }
if (engine == kLlamaRepo)
return dp.Handle({DownloadType::Engine, DownloadType::CudaToolkit});
} else {
return dp.Handle({DownloadType::Engine});
}
else
return dp.Handle({});
});

auto install_url = url_parser::Url{
Expand Down
8 changes: 6 additions & 2 deletions engine/cli/commands/model_pull_cmd.cc
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,12 @@ std::optional<std::string> ModelPullCmd::Exec(const std::string& host, int port,
auto download_url = res.value()["downloadUrl"].asString();

if (downloaded.empty() && avails.empty()) {
model_id = id;
model = download_url;
if (res.value()["modelSource"].asString() == "huggingface") {
model = id;
} else {
model_id = id;
model = download_url;
}
} else {
if (is_cortexso) {
auto selection = cli_selection_utils::PrintModelSelection(
Expand Down
17 changes: 12 additions & 5 deletions engine/cli/commands/run_cmd.cc
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,18 @@ void RunCmd::Exec(bool run_detach,
CLI_LOG("Error: " + model_entry.error());
return;
}
yaml_handler.ModelConfigFromFile(
fmu::ToAbsoluteCortexDataPath(
fs::path(model_entry.value().path_to_model_yaml))
.string());
auto mc = yaml_handler.GetModelConfig();

config::ModelConfig mc;
if (model_entry.value().engine == kVllmEngine) {
// vLLM engine doesn't have model config
mc.engine = kVllmEngine;
} else {
yaml_handler.ModelConfigFromFile(
fmu::ToAbsoluteCortexDataPath(
fs::path(model_entry.value().path_to_model_yaml))
.string());
mc = yaml_handler.GetModelConfig();
}

// Check if engine existed. If not, download it
{
Expand Down
54 changes: 16 additions & 38 deletions engine/controllers/models.cc
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ void Models::PullModel(const HttpRequestPtr& req,
return;
}

auto model_handle = (*(req->getJsonObject())).get("model", "").asString();
auto model_handle = req->getJsonObject()->get("model", "").asString();
if (model_handle.empty()) {
Json::Value ret;
ret["result"] = "Bad Request";
Expand All @@ -39,52 +39,19 @@ void Models::PullModel(const HttpRequestPtr& req,
}

std::optional<std::string> desired_model_id = std::nullopt;
auto id = (*(req->getJsonObject())).get("id", "").asString();
auto id = req->getJsonObject()->get("id", "").asString();
if (!id.empty()) {
desired_model_id = id;
}

std::optional<std::string> desired_model_name = std::nullopt;
auto name_value = (*(req->getJsonObject())).get("name", "").asString();

auto name_value = req->getJsonObject()->get("name", "").asString();
if (!name_value.empty()) {
desired_model_name = name_value;
}

auto handle_model_input =
[&, model_handle]() -> cpp::result<DownloadTask, std::string> {
CTL_INF("Handle model input, model handle: " + model_handle);
if (string_utils::StartsWith(model_handle, "https")) {
return model_service_->HandleDownloadUrlAsync(
model_handle, desired_model_id, desired_model_name);
} else if (model_handle.find(":") != std::string::npos) {
auto model_and_branch = string_utils::SplitBy(model_handle, ":");
if (model_and_branch.size() == 3) {
auto mh = url_parser::Url{
/* .protocol = */ "https",
/* .host = */ kHuggingFaceHost,
/* .pathParams = */
{
model_and_branch[0],
model_and_branch[1],
"resolve",
"main",
model_and_branch[2],
},
/* queries= */ {},
}
.ToFullPath();
return model_service_->HandleDownloadUrlAsync(mh, desired_model_id,
desired_model_name);
}
return model_service_->DownloadModelFromCortexsoAsync(
model_and_branch[0], model_and_branch[1], desired_model_id);
}

return cpp::fail("Invalid model handle or not supported!");
};

auto result = handle_model_input();
auto result = model_service_->PullModel(model_handle, desired_model_id,
desired_model_name);
if (result.has_error()) {
Json::Value ret;
ret["message"] = result.error();
Expand Down Expand Up @@ -213,6 +180,17 @@ void Models::ListModel(
data.append(std::move(obj));
continue;
}

if (model_entry.engine == kVllmEngine) {
Json::Value obj;
obj["id"] = model_entry.model;
obj["model"] = model_entry.model;
obj["engine"] = model_entry.engine;
obj["status"] = "downloaded";
data.append(std::move(obj));
continue;
}

yaml_handler.ModelConfigFromFile(
fmu::ToAbsoluteCortexDataPath(
fs::path(model_entry.path_to_model_yaml))
Expand Down
2 changes: 1 addition & 1 deletion engine/controllers/server.cc
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ void server::ProcessStreamRes(std::function<void(const HttpResponsePtr&)> cb,
auto err_or_done = std::make_shared<std::atomic_bool>(false);
auto chunked_content_provider = [this, q, err_or_done, engine_type, model_id](
char* buf,
std::size_t buf_size) -> std::size_t {
std::size_t buf_size) -> std::size_t {
if (buf == nullptr) {
LOG_TRACE << "Buf is null";
if (!(*err_or_done)) {
Expand Down
6 changes: 3 additions & 3 deletions engine/e2e-test/api/engines/test_api_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ def setup_and_teardown(self):

# Teardown
stop_server()

# engines get
def test_engines_get_llamacpp_should_be_successful(self):
response = requests.get("http://localhost:3928/engines/llama-cpp")
assert response.status_code == 200

# engines install
def test_engines_install_llamacpp_specific_version_and_variant(self):
data = {"version": "v0.1.40-b4354", "variant": "linux-amd64-avx"}
Expand All @@ -40,7 +40,7 @@ def test_engines_install_llamacpp_specific_version_and_null_variant(self):
"http://localhost:3928/v1/engines/llama-cpp/install", json=data
)
assert response.status_code == 200

# engines uninstall
@pytest.mark.asyncio
async def test_engines_install_uninstall_llamacpp_should_be_successful(self):
Expand Down
119 changes: 119 additions & 0 deletions engine/extensions/python-engines/python_utils.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#include "python_utils.h"
#include <filesystem>

#include "utils/archive_utils.h"
#include "utils/curl_utils.h"
#include "utils/file_manager_utils.h"
#include "utils/set_permission_utils.h"
#include "utils/system_info_utils.h"

namespace python_utils {

std::filesystem::path GetPythonEnginesPath() {
return file_manager_utils::GetCortexDataPath() / "python_engines";
}
std::filesystem::path GetEnvsPath() {
return GetPythonEnginesPath() / "envs";
}
std::filesystem::path GetUvPath() {
auto system_info = system_info_utils::GetSystemInfo();
const auto bin_name = system_info->os == kWindowsOs ? "uv.exe" : "uv";
return GetPythonEnginesPath() / "bin" / bin_name;
}
bool UvCleanCache() {
auto cmd = UvBuildCommand("cache");
cmd.push_back("clean");
auto result = cortex::process::SpawnProcess(cmd);
if (result.has_error()) {
CTL_INF(result.error());
return false;
}
return cortex::process::WaitProcess(result.value());
}

bool UvIsInstalled() {
return std::filesystem::exists(GetUvPath());
}
cpp::result<void, std::string> UvInstall() {
const auto py_bin_path = GetPythonEnginesPath() / "bin";
std::filesystem::create_directories(py_bin_path);

// NOTE: do we need a mechanism to update uv, or just pin uv version with cortex release?
const std::string uv_version = "0.6.11";

// build download url based on system info
std::stringstream fname_stream;
fname_stream << "uv-";

auto system_info = system_info_utils::GetSystemInfo();
if (system_info->arch == "amd64")
fname_stream << "x86_64";
else if (system_info->arch == "arm64")
fname_stream << "aarch64";

// NOTE: there is also a musl linux version
if (system_info->os == kMacOs)
fname_stream << "-apple-darwin.tar.gz";
else if (system_info->os == kWindowsOs)
fname_stream << "-pc-windows-msvc.zip";
else if (system_info->os == kLinuxOs)
fname_stream << "-unknown-linux-gnu.tar.gz";

const std::string fname = fname_stream.str();
const std::string base_url =
"https://github.com/astral-sh/uv/releases/download/";

std::stringstream url_stream;
url_stream << base_url << uv_version << "/" << fname;
const std::string url = url_stream.str();
CTL_INF("Download uv from " << url);

const auto save_path = py_bin_path / fname;
auto res = curl_utils::SimpleDownload(url, save_path.string());
if (res.has_error())
return res;

archive_utils::ExtractArchive(save_path, py_bin_path.string(), true);
set_permission_utils::SetExecutePermissionsRecursive(py_bin_path);
std::filesystem::remove(save_path);

// install Python3.10 from Astral. this will be preferred over system
// Python when possible.
// NOTE: currently this will install to a user-wide directory. we can
// install to a specific location using `--install-dir`, but later
// invocation of `uv run` needs to have `UV_PYTHON_INSTALL_DIR` set to use
// this Python installation.
// we can add this once we allow passing custom env var to SpawnProcess().
// https://docs.astral.sh/uv/reference/cli/#uv-python-install
std::vector<std::string> command = UvBuildCommand("python");
command.push_back("install");
command.push_back("3.10");

auto result = cortex::process::SpawnProcess(command);
if (result.has_error())
return cpp::fail(result.error());

if (!cortex::process::WaitProcess(result.value())) {
const auto msg = "Process spawned but fail to wait";
CTL_ERR(msg);
return cpp::fail(msg);
}

return {};
}

std::vector<std::string> UvBuildCommand(const std::string& action,
const std::string& directory) {
// use our own cache dir so that when users delete cortexcpp/, everything is deleted.
const auto cache_dir = GetPythonEnginesPath() / "cache" / "uv";
std::vector<std::string> command = {GetUvPath().string(), "--cache-dir",
cache_dir.string()};
if (!directory.empty()) {
command.push_back("--directory");
command.push_back(directory);
}
command.push_back(action);
return command;
}

} // namespace python_utils
Loading
Loading