Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(api,ui,sdk): Make CPU limits configurable (#586)
# Description As of present, users are not able to configure the CPU limits of the pods in which Merlin models and transformers are deployed in - they are instead determined automatically on the platform-level (Merlin API server). Depending on how the API server has been configured, one of the following happens: - the CPU limit of a model is set as its CPU request value, multiplied by a [scaling factor](https://github.com/caraml-dev/merlin/blob/f1ebe099ea168988b365ee72ce08543b127826e1/api/config/config.go#L364) (e.g. 2 CPU * 1.5) **or,** - Note that this is the existing way memory limits are automatically set by the Merlin API server - the CPU limit is left unset - Note that because KServe does not currently allow CPU limits to be completely unset, the Merlin API server instead sets an [arbitrary value ](https://github.com/caraml-dev/merlin/blob/f1ebe099ea168988b365ee72ce08543b127826e1/api/config/config.go#L363)(ideally one that is very big) as the CPU limit instead This PR introduces a new workflow which would allow users to instead override the platform-level CPU limits (described in the paragraph above) set on a model. This workflow is available via the UI, SDK and by extension, directly calling the API endpoint of the API server. UI: ![Screenshot 2024-05-24 at 2 13 46 PM](https://github.com/caraml-dev/merlin/assets/36802364/a2b59c1e-df2d-4070-92ff-b4f375256da1) ![Screenshot 2024-05-24 at 2 23 42 PM](https://github.com/caraml-dev/merlin/assets/36802364/616f0d03-0d36-4b82-8dd5-051d098a78c2) SDK: ```python merlin.deploy( version_1, resource_request=merlin.ResourceRequest( min_replica=0, max_replica=0, cpu_request="0.5", cpu_limit="2", memory_request="1Gi", ), ) ``` In addition, this PR adds a new configuration, `DefaultEnvVarsWithoutCPULimits`, which is a list of env vars that automatically get added to all Merlin models and transformers when CPU limits are not set. This allows the Merlin API server's operators to set env vars platform-wide that can potentially improve these deployments' performance, e.g. env vars involving concurrency. # Modifications - `api/cluster/resource/templater.go` - Refactoring of templater methods to set default env vars when cpu limits are not explicitly set and when the cpu limit scaling factor is set as 0 - `api/config/config.go` - Addition of the new field `DefaultEnvVarsWithoutCPULimits` - `api/config/config_test.go` - Addition of a new unit test to test the parsing of configs from .yaml files - `docs/user/templates/model_deployment/01_deploying_a_model_version.md` - Addition of docs to demonstrate how the platform-level CPU limits can be overriden - `python/sdk/merlin/resource_request.py` - Addition of a new cpu limit field to the resource request class - `ui/src/pages/version/components/forms/components/CPULimitsFormGroup.js` - Addition of a new form group to allow cpu limits to be specified on the UI # Tests - [x] Deploying existing models (and transformers) with and without CPU limits set # Checklist - [x] Added PR label - [x] Added unit test, integration, and/or e2e tests - [x] Tested locally - [x] Updated documentation - [x] Update Swagger spec if the PR introduce API changes - [x] Regenerated Golang and Python client if the PR introduces API changes # Release Notes <!-- Does this PR introduce a user-facing change? If no, just write "NONE" in the release-note block below. If yes, a release note is required. Enter your extended release note in the block below. If the PR requires additional action from users switching to the new release, include the string "action required". For more information about release notes, see kubernetes' guide here: http://git.k8s.io/community/contributors/guide/release-notes.md --> ```release-note NONE ```
- Loading branch information