-
Notifications
You must be signed in to change notification settings - Fork 110
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore(design): adding pv migration proposal
Signed-off-by: Pawan <[email protected]>
- Loading branch information
1 parent
4fce22a
commit 8267800
Showing
1 changed file
with
86 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
--- | ||
title: Volume Migration for ZFS-LocalPV | ||
authors: | ||
- "@pawanpraka1" | ||
owners: | ||
- "@kmova" | ||
creation-date: 2021-05-21 | ||
last-updated: 2021-05-21 | ||
--- | ||
|
||
# Volume Migration for ZFS-LocalPV | ||
|
||
## Table of Contents | ||
|
||
* [Table of Contents](#table-of-contents) | ||
* [Summary](#summary) | ||
* [Problem](#problem) | ||
* [Current Solution](#current-solution) | ||
* [Proposal](#proposal) | ||
* [Keys Per ZPOOL](#keys-per-zpool) | ||
* [Migrator](#migrator) | ||
* [Workflow](#workflow) | ||
* [Implementation Plan](#implementation-plan) | ||
|
||
## Summary | ||
|
||
This is a design proposal to implement a feature for migrating a volume from one node to another node. This doc describes how we can move the persistence volumes and the application to the other node for ZFS-LocalPV CSI Driver. This design expects that the administrators will move the disks to the new node and will import the pool there as part of replacing the node. The goal of this design is volume and the pod using the volume should be moved to the new nodes. This design also assumes that admins are not having large number of ZFS POOLs configured on a node. | ||
|
||
## Problem | ||
|
||
The problem with the LocalPV is, it has the affinity set on the PersistenceVolume object. This will let k8s scheduler to schedule the pods to that node as data is there only. ZFS-LocalPV driver uses nodename to set the affinity which creates the problem here as if we are replacing a node, the node name will change and k8s scheduler will not be able to schedule the pods to the new node even if we have moved the disks there. | ||
|
||
## Current Solution | ||
|
||
Current solution depends on `openebs.io/nodeid` node label. Use has to set the same label to the replaced node in which case k8s scheduler will automatically schedule the pods to the new node. We can read more about the solution [here](https://github.com/openebs/zfs-localpv/blob/master/docs/faq.md#8-how-to-migrate-pvs-to-the-new-node-in-case-old-node-is-not-accessible). | ||
|
||
The problem with the above approach is we can not move the volumes to any existing node as we can set only one label on the node for a given key. So, for any existing node in a cluster label `openebs.io/nodeid` will already be set so we can not update to new value as the pods running on it will not able to get scheduled here. | ||
|
||
|
||
## Proposal | ||
|
||
### Keys Per ZPOOL | ||
|
||
We are proposing to have a key dedicated to ZFS POOL. This key will be used by the ZFS-LocalPV driver to set the label on the nodes where it is present. In this way we can allow the ZFS POOLs to move from any node to any other node as the key is tied to the ZFS POOL as opposed to keeping it per node. We are proposing to have a `openebs.io/<poolname>=true` label on the node where the pool is present. Assuming admins do not have large number of pools on a node, there will be not much label set on a node. | ||
|
||
### Migrator | ||
|
||
ZFS POOL name sould be same across all the nodes for ZFS-LocalPV. So we have to rename the ZFS POOLs if we are moving it to any existing node. We need a Migrator workflow to update the POOL name in the ZFSVolume object. This will find all the volumes present in a ZFS POOL on that node and updates the ZFSVolume object with the correct PoolName. | ||
|
||
**Note:** We can not edit PV volumeAttributes with the new pool name as it is immutable feild. | ||
|
||
The migrator will look for all the volumes start with the name "pvc-" and will look for corresponding ZFSVolume object and will update it. In case of users has imported the existing volumes where it is possible that volume name might not start with the name "pvc-", the PoolName needs to be updated manually as the volume was created manually by the admin. | ||
|
||
### Workflow | ||
|
||
- user will setup all the nodes and setup the ZFS pool on each of those nodes. | ||
- the ZFS-LocalPV CSI driver will look for all the pools on the node and will set the `openebs.io/<poolname>=true` label for all ZFS POOLs that is present on that node. Let's say node-1 has two pools(say pool1 and pool2) present then the labels will be like this : | ||
``` | ||
$ kubectl get node pawan-node-1 --show-labels | ||
NAME STATUS ROLES AGE VERSION LABELS | ||
node-1 Ready worker 351d v1.17.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=true,openebs.io/nodeid=node1,openebs.io/nodename=node-1,openebs.io/pool1=true,openebs.io/pool2=true | ||
``` | ||
- If we are moving the pool1 from node1 to node2, then there are two cases here :- | ||
|
||
#### 1. if node2 is a fresh node | ||
|
||
- we can simply import the pool and restart the ZFS-LocalPV driver to make it aware of that pool to set the corresponding node topology | ||
- the ZFS-LocalPV driver will look for `openebs.io/pool1=true` and removes the label from the nodes where pool is not present | ||
- the ZFS-LocalPV driver will update the new node with `openebs.io/pool1=true` label | ||
- the k8s scheduler will be able to see the new label and should schedule the pods to this new node. | ||
|
||
#### 2. if node2 is existing node and Pool of the same name is present there | ||
|
||
- here we need import the pool with the different name and restart the ZFS-LocalPV driver to make it aware of that pool to set the corresponding node topology | ||
- the ZFS-LocalPV driver will look for `openebs.io/pool1=true` and removes the label from the nodes where the pool is not present | ||
- the ZFS-LocalPV driver will update the new node with `openebs.io/pool1=true` label | ||
- the migrator will look for ZFSVolume resource and update the pool name for all the volumes. | ||
- the k8s scheduler will be able to see the new label and should schedule the pods to this new node. | ||
|
||
## Implementation Plan | ||
|
||
### Phase 1 | ||
1. Implement replacement with fresh node | ||
|
||
### Phase 2 | ||
1. Implement replacement with exisitng node (implement Migrator). |