Merge pull request #30 from kerthcet/feat/image-support

Release v0.0.1
InftyAI · Jul 23, 2024 · 878203b · 878203b
2 parents 3110c05 + 865fdfc
commit 878203b
Show file tree

Hide file tree

Showing 4 changed files with 107 additions and 18 deletions.
diff --git a/.gitignore b/.gitignore
@@ -25,3 +25,4 @@ Dockerfile.cross
 *.swo
 *~
 .DS_Store
+artifacts
diff --git a/Makefile b/Makefile
@@ -267,3 +267,13 @@ $(CONTROLLER_GEN): $(LOCALBIN)
 envtest: $(ENVTEST) ## Download envtest-setup locally if necessary.
 $(ENVTEST): $(LOCALBIN)
 	test -s $(LOCALBIN)/setup-envtest || GOBIN=$(LOCALBIN) go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest
+
+##@Release
+
+.PHONY: artifacts
+artifacts: kustomize
+	cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG}
+	if [ -d artifacts ]; then rm -rf artifacts; fi
+	mkdir -p artifacts
+	$(KUSTOMIZE) build config/default -o artifacts/manifests.yaml
+	@$(call clean-manifests)
diff --git a/README.md b/README.md
@@ -7,25 +7,31 @@
 [GoReport Widget]: https://goreportcard.com/badge/github.com/inftyai/llmaz
 [GoReport Status]: https://goreportcard.com/report/github.com/inftyai/llmaz
 
-llmaz, pronounced as `/lima:z/`, aims to provide a production-ready inference platform for large language models on Kubernetes. It tightly integrates with state-of-the-art inference backends, such as [vLLM](https://github.com/vllm-project/vllm).
+**llmaz** (pronounced `/lima:z/`), aims to provide a **Production-Ready** inference platform for large language models on **Kubernetes**. It closely integrates with state-of-the-art inference backends like [vLLM](https://github.com/vllm-project/vllm) to bring the cutting-edge researches to cloud.
 
 ## Concept
 
 ![image](./docs/assets/overview.png)
 
 ## Feature Overview
 
-- **Easy to use**: People can deploy a production-ready LLM service with minimal configurations.
-- **High performance**: llmaz integrates with vLLM by default for high performance inference. Other backend supports are on the way.
-- **Autoscaling efficiency**: llmaz works smoothly with autoscaling components like [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) and [Karpenter](https://github.com/kubernetes-sigs/karpenter) to support elastic scenarios.
-- **Accelerator fungibility**: llmaz supports serving LLMs with different accelerators for the sake of cost and performance.
-- **SOTA inference technologies**: llmaz support the latest SOTA technologies like [speculative decoding](https://arxiv.org/abs/2211.17192) and [Splitwise](https://arxiv.org/abs/2311.18677).
+- **User Friendly**: People can quick deploy a LLM service with minimal configurations.
+- **High Performance**: llmaz integrates with vLLM by default for high performance inference. Other backends support are on the way.
+- **Scaling Efficiency**: llmaz works smoothly with autoscaling components like [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or [Karpenter](https://github.com/kubernetes-sigs/karpenter) to support elastic cases.
+- **Accelerator Fungibility**: llmaz supports serving the same LLMs with various accelerators to optimize cost and performance.
+- **SOTA Inference**: llmaz support the latest cutting-edge researches like [Speculative Decoding](https://arxiv.org/abs/2211.17192) and [Splitwise](https://arxiv.org/abs/2311.18677).
 
 ## Quick Start
 
-Once `Model`s (e.g. opt-125m) published, you can quick deploy a `Playground` for serving.
+### Installation
 
-### Model
+Read the [Installation](./docs/installation.md) for guidance.
+
+### Deploy
+
+Once `Model`s (e.g. facebook/opt-125m) are published, you can quick deploy a `Playground` to serve the model.
+
+#### Model
 
 ```yaml
 apiVersion: llmaz.io/v1alpha1
@@ -37,12 +43,12 @@ spec:
   dataSource:
     modelID: facebook/opt-125m
   inferenceFlavors:
-  - name: t4
+  - name: t4 # GPU type
     requests:
       nvidia.com/gpu: 1
 ```
 
-### Inference Playground
+#### Inference Playground
 
 ```yaml
 apiVersion: inference.llmaz.io/v1alpha1
@@ -55,16 +61,41 @@ spec:
     modelName: opt-125m
 ```
 
-Refer to more **[Examples](/docs/examples/README.md)** for references.
+### Test
+
+#### Expose the service
+
+```cmd
+kubectl port-forward pod/opt-125m-0 8080:8080
+```
+
+#### See registered models
+
+```cmd
+curl http://localhost:8080/v1/models
+```
+
+#### Request a query
+
+```cmd
+curl http://localhost:8080/v1/completions \
+-H "Content-Type: application/json" \
+-d '{
+    "model": "facebook/opt-125m",
+    "prompt": "San Francisco is a",
+    "max_tokens": 10,
+    "temperature": 0
+}'
+```
+
+Refer to **[examples](/docs/examples/README.md)** to learn more.
 
 ## Roadmap
 
-- Metrics support
-- Autoscaling support
-- Gateway support
-- Serverless support
-- CLI tool
-- Model training, fine tuning in the long-term.
+- Gateway support for traffic routing
+- Serverless support for cloud-agnostic users
+- CLI tool support
+- Model training, fine tuning in the long-term
 
 ## Contributions
 
@@ -76,4 +107,4 @@ Refer to more **[Examples](/docs/examples/README.md)** for references.
 
 <a href="https://github.com/InftyAI/llmaz/graphs/contributors">
   <img src="https://contrib.rocks/image?repo=InftyAI/llmaz" />
-</a>
+</a>
diff --git a/docs/installation.md b/docs/installation.md
@@ -0,0 +1,47 @@
+# Installation Guide
+
+## Prerequisites
+
+* Kubernetes version >= 1.27
+
+## Install a released version
+
+### Install
+
+```cmd
+# leaderworkerset runs in lws-system
+LWS_VERSION=v0.3.0
+kubectl apply --server-side -f https://github.com/kubernetes-sigs/lws/releases/download/$LWS_VERSION/manifests.yaml
+
+# llmaz runs in llmaz-system
+LLMAZ_VERSION=v0.0.1
+kubectl apply --server-side -f https://github.com/inftyai/llmaz/releases/download/$LLMAZ_VERSION/manifests.yaml
+```
+
+### Uninstall
+
+```cmd
+LWS_VERSION=v0.3.0
+kubectl delete -f https://github.com/kubernetes-sigs/lws/releases/download/$LWS_VERSION/manifests.yaml
+
+LLMAZ_VERSION=v0.0.1
+kubectl delete -f https://github.com/inftyai/llmaz/releases/download/$LLMAZ_VERSION/manifests.yaml
+```
+
+## Install from source
+
+### Install
+
+```cmd
+LWS_VERSION=v0.3.0
+kubectl apply --server-side -f https://github.com/kubernetes-sigs/lws/releases/download/$LWS_VERSION/manifests.yaml
+
+git clone https://github.com/inftyai/llmaz.git && cd llmaz
+IMG=<IMAGE_REPO>:<GIT_TAG> make image-push deploy
+```
+
+### Uninstall
+
+```cmd
+make undeploy
+```