Skip to content

This project provides a Kubernetes Operator for managing the lifecycle of the inference-gateway and its related components. It simplifies deployment, configuration, and scaling of the gateway within Kubernetes clusters, enabling seamless integration of machine learning inference workflows.

License

Notifications You must be signed in to change notification settings

inference-gateway/operator

Repository files navigation

Operator

A Kubernetes operator for automating the management and operation of applications on Kubernetes.

Description

This Kubernetes operator extends the Kubernetes API to create, configure and manage Inference Gateway instances within a Kubernetes cluster. It's specifically designed to automate the deployment and management of AI inference gateways that connect with Model Context Protocol (MCP) servers, which may live on remote infrastructure. In its first stage, the operator focuses on controlling in-cluster resources such as deployments, services, and configuration for the gateway components.

The operator handles various aspects of gateway management including authentication configuration, provider service integration, and network endpoint exposure. Future development will extend its capabilities to create and manage the lifecycle of remote MCP servers that the gateway connects to, providing a unified control plane for both the gateway infrastructure and its associated AI workloads.

The operator pattern allows you to codify operational knowledge and workflows that would typically require human operator intervention. This implementation follows best practices for cloud-native applications, including high availability, scalability, and seamless upgrades.

Getting Started

Prerequisites

  • go version v1.23.0+
  • docker version 17.03+.
  • kubectl version v1.11.3+.
  • Access to a Kubernetes v1.11.3+ cluster.

To Deploy on the cluster

Build and push your image to the location specified by IMG:

task docker-build docker-push IMG=<some-registry>/operator:tag

NOTE: This image ought to be published in the personal registry you specified. And it is required to have access to pull the image from the working environment. Make sure you have the proper permission to the registry if the above commands don’t work.

Install the CRDs into the cluster:

task install

Deploy the Manager to the cluster with the image specified by IMG:

task deploy IMG=<some-registry>/operator:tag

NOTE: If you encounter RBAC errors, you may need to grant yourself cluster-admin privileges or be logged in as admin.

Create instances of your solution You can apply the samples (examples) from the config/sample:

kubectl apply -k config/samples/

NOTE: Ensure that the samples has default values to test it out.

To Uninstall

Delete the instances (CRs) from the cluster:

kubectl delete -k config/samples/

Delete the APIs(CRDs) from the cluster:

task uninstall

UnDeploy the controller from the cluster:

task undeploy

Project Distribution

Following the options to release and provide this solution to the users.

By providing a bundle with all YAML files

  1. Build the installer for the image built and published in the registry:
task build-installer IMG=<some-registry>/operator:tag

NOTE: The makefile target mentioned above generates an 'install.yaml' file in the dist directory. This file contains all the resources built with Kustomize, which are necessary to install this project without its dependencies.

  1. Using the installer

Users can just run 'kubectl apply -f ' to install the project, i.e.:

kubectl apply -f https://raw.githubusercontent.com/<org>/operator/<tag or branch>/dist/install.yaml

By providing a Helm Chart

  1. Build the chart using the optional helm plugin
kubebuilder edit --plugins=helm/v1-alpha
  1. See that a chart was generated under 'dist/chart', and users can obtain this solution from there.

NOTE: If you change the project, you need to update the Helm Chart using the same command above to sync the latest changes. Furthermore, if you create webhooks, you need to use the above command with the '--force' flag and manually ensure that any custom configuration previously added to 'dist/chart/values.yaml' or 'dist/chart/manager/manager.yaml' is manually re-applied afterwards.

Contributing

We welcome contributions to the Inference Gateway Operator project!

For detailed information on how to contribute to this project, please see CONTRIBUTING.md. The document covers:

  • Development environment setup
  • Development workflow
  • Code style and guidelines
  • Adding new features
  • Testing requirements
  • Pull request process
  • And more

Your contributions help make the Inference Gateway Operator better for everyone. Thank you for your interest in the project!

NOTE: Run task help for more information on all potential task targets

More information can be found via the Kubebuilder Documentation

License

MIT License

Copyright (c) 2025 Inference Gateway

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

This project provides a Kubernetes Operator for managing the lifecycle of the inference-gateway and its related components. It simplifies deployment, configuration, and scaling of the gateway within Kubernetes clusters, enabling seamless integration of machine learning inference workflows.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published