Skip to content

Commit

Permalink
Add multi tenancy documentation (#31)
Browse files Browse the repository at this point in the history
  • Loading branch information
majst01 authored Feb 3, 2022
1 parent 1568e55 commit a38da72
Show file tree
Hide file tree
Showing 3 changed files with 127 additions and 0 deletions.
123 changes: 123 additions & 0 deletions MULTITENANCY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Multi tenancy with lightbits storage

Multi tenancy is a crucial aspect when providing central storage. This divides into several aspects, impact to storage from one tenant to another tenant must be restricted so that neither data can be seen, destroyed or modified by another tenant, but also read and write actions at a high rate must not impact the performance of the storage of other tenants.

Lightbits storage uses NVMEoTCP (NVME over TCP) as transport protocol as defined here: [NVMEoF](https://nvmexpress.org/developers/nvme-of-specification/). The storage traffic is routed over the same network as normal TCP/IP traffic. The basic setup of the components is shown here:

![Diagram](nvme-over-tcp.jpg)

The current implementation prevents malicious access to data, prevention of performance impacts are subject of later lightos releases.

## Gardener and metal-stack

Multi tenancy in metal-stack and gardener are based on projects. In metal-stack, projects additionally belong to a tenant entity that groups projects. A single kubernetes cluster is created in the scope of project, one project can have multiple kubernetes clusters. Every kubernetes cluster will get physically separated firewall and worker nodes in a dedicated routing domain called VRF. Every kubernetes cluster is totally separated from a physical an network perspective, nothing is shared.

Lightbits storage has also the notion of a project, once a cluster is created, a new project is created in the lightos storage API, the project there matches the project from the gardener/metal-stack perspective. For every cluster an authentication token in the JWT format is created, this token is able to create/update/list/delete volumes in the lightos cluster in the given project, resp. lightos project. For every kubernetes cluster, even in the same project, an individual JWT token is created. The token is also set to have a 8 day validity, 1 day before the token will get invalid and the cluster still exists, a new token is issued.

The duros-controller is responsible to create such tokens, it is deployed in the seed's shoot namespace (find details on gardener architecture [here](https://github.com/gardener/gardener/blob/master/docs/concepts/architecture.md)). This namespace is fully managed by the provider and invisible for the cluster user. Once the token has been created, the token is stored in a secret in the actual user cluster alongside with the deployment of the lightbits CSI driver and storage classes. This CSI driver will then be responsible to create/update/delete volumes based on the manifests deployed in the cluster.

```bash
k get sc

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
partition-gold csi.lightbitslabs.com Delete Immediate true 7d4h
partition-silver csi.lightbitslabs.com Delete Immediate true 7d4h
```

The storageclass partition-gold with 3 fold replication and the pointers to the secrets.

```bash
k get sc partition-gold -o yaml

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: "2022-01-24T08:40:03Z"
name: partition-gold
resourceVersion: "234"
uid: 3b40edfa-ff72-4426-904c-4205b061e311
parameters:
compression: enabled
csi.storage.k8s.io/controller-expand-secret-name: lb-csi-creds
csi.storage.k8s.io/controller-expand-secret-namespace: kube-system
csi.storage.k8s.io/controller-publish-secret-name: lb-csi-creds
csi.storage.k8s.io/controller-publish-secret-namespace: kube-system
csi.storage.k8s.io/node-publish-secret-name: lb-csi-creds
csi.storage.k8s.io/node-publish-secret-namespace: kube-system
csi.storage.k8s.io/node-stage-secret-name: lb-csi-creds
csi.storage.k8s.io/node-stage-secret-namespace: kube-system
csi.storage.k8s.io/provisioner-secret-name: lb-csi-creds
csi.storage.k8s.io/provisioner-secret-namespace: kube-system
mgmt-endpoint: 10.131.44.1:443,10.131.44.2:443,10.131.44.3:443
mgmt-scheme: grpcs
project-name: 0f89286d-0429-4209-a8a9-8612befbff97
replica-count: "3"
provisioner: csi.lightbitslabs.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
```

This is the secret where the storageclass points to:

```bash
k get secret lb-csi-creds -o yaml

apiVersion: v1
data:
jwt: ZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNklqQm1PRGt5T0Raa0xUQTBNamt0TkRJd09TMWhPR0U1TFRnMk1USmlaV1ppWm1ZNU56cHliMjkwSWl3aWRIbHdJam9pU2xkVUluMC5leUpsZUhBaU9qRTJORFF6TURrMk1qa3NJbXAwYVNJNkltUmpNVGc0TWpVd0xUZGtORGt0TkRjMk1DMWlZVGs0TFdGbE1ESmpNR0l5WmpNeVpTSXNJbWxoZENJNk1UWTBNell4T0RReU9Td2lhWE56SWpvaVpIVnliM010WTI5dWRISnZiR3hsY2lJc0luTjFZaUk2SW5Ob2IyOTBMUzF3WkRjMmJYSXRMV2x1ZEhSbGMzUXdJaXdpY205c1pYTWlPbHNpTUdZNE9USTRObVF0TURReU9TMDBNakE1TFdFNFlUa3RPRFl4TW1KbFptSm1aamszT21Ga2JXbHVJbDE5LmIxaWo0aHV0R2lmUll1YlZMb1J5WlBKRHZ0ZWpodDZqdW1KdW1xbEMyOWpwRUxWa0JfdG4tZU9VbERPb09HUEZTN2FhRDBGOXRKSGVrOGVYQ0xqZ1R2RkdCMzI5aE5zTzlra0M5OXNQZWJvaWE1RmRLUmlUbjNBTC1KcXZZZ3pKZTNaZmZNWFdFVHhsZmxSXzFTNERpQlZFNERSc3hNczNpbWt2Nl83cU5BUEhXd2ZCdU5OUDVyMmxOdGRqdVl2VXlqN3hNWTZhODdSU1RkMUZINGlaMUx3OEZwLW9haTdyN1M1SlhnMkhBcUU2VTJ1UTMzMWhwMlREY3M2ZTJjcDdpbjJNSnhHWkRYUE5SSGFnRUJyWFdfYVVJdjJpYTU3emVGeWoyMFdhVVlUVS1rRFhxYXBUaDJfQXlPV19jQ1hhbTJWdVY0N1IwOWxqdVNocGRaXzJFZw==
kind: Secret
metadata:
creationTimestamp: "2022-01-24T08:40:02Z"
name: lb-csi-creds
namespace: kube-system
resourceVersion: "1837602"
uid: 406c3583-d4cd-4a0b-8615-3a6f2b9b7577

```

## Lightbits and NVMEoTCP

Once a volume is created and mounted, e.g. a PVC and PV, the csi driver will first create the volume on the lightos API with his token and set the hosts which are allowed to talk to this volume to the name of the worker nodes.

```bash
k get pvc,pv,node
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-sampla-app-0 Bound pvc-c4b7822b-b3c8-414a-a1fa-9350d30a4f5c 1Gi RWO partition-silver 25s

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-c4b7822b-b3c8-414a-a1fa-9350d30a4f5c 1Gi RWO Delete Bound sampla-app/data-sampla-app-0 partition-silver 25s

NAME STATUS ROLES AGE VERSION
node/shoot--pd76mr--inttest0-group-0-845b8-49r7x Ready node 6d21h v1.21.9
node/shoot--pd76mr--inttest0-group-0-845b8-ng7xh Ready node 7d1h v1.21.9
```

The pod which mounts this volume is running on the node `shoot--pd76mr--inttest0-group-0-845b8-49r7x`.

```bash
k get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
sampla-app-0 1/1 Running 0 5m34s 10.244.15.152 shoot--pd76mr--inttest0-group-0-845b8-49r7x <none> <none>
```

Once the lightos cluster has set the ACL of this volume only a node which sets the hostnqn matching the given ACL is able to mount that volume.

```bash
lbcli list volumes
Name UUID State Protection State NSID Size Replicas Compression ACL Rebuild Progress
pvc-c4b7822b-b3c8-414a-a1fa-9350d30a4f5c 7828aa17-2316-442d-883e-d000436d41f2 Available FullyProtected 631 1.0 GiB 2 true values:"nqn.2019-09.com.lightbitslabs:host:shoot--pd76mr--inttest0-group-0-845b8-49r7x.node" None
```

The NVMEoTCP module in the linux kernel on the worker node side and on the lightos side implements setting the host nqn to match these ACL expectations.

This can be inspected on the worker node side by looking at the host nqn, e.g. the name of the nvme drive (NVMe qualified name). This nqn matches the ACL on the lightos server side.

```bash
cat /sys/devices/virtual/nvme-fabrics/ctl/nvme1/hostnqn
nqn.2019-09.com.lightbitslabs:host:shoot--pd76mr--inttest0-group-0-845b8-49r7x.node
```

## Further improvements

In the upcoming lightos release the performance aspects of multi tenancy are addressed. This is achieved that it will be possible to cap the maximum throughput possible per volume. This will ensure that no single tenant is able to saturate the whole lightos cluster and impact other tenants using this lightos cluster.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ A Duros project will be deleted if the metal-api project is deleted. A check if
Accounting of volumes is done with the kube-counter running in every shoot in the seed. Accounting of volumes currently not in use in any of the clusters
are listed from the cloud-api and reported to the accounting-api.

## Tenant separation

How tenant separation works is described more detailed [here](MULTITENANCY.md)

## TODO

- check if Gardener deletes PVC's after cluster deletion.
Binary file added nvme-over-tcp.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a38da72

Please sign in to comment.