Reduce excessive number of running datamovers relative to loadConcurrency values #8344

msfrucht · 2024-10-23T23:44:35Z

Describe the problem/challenge you have
During backup with EnableCSI and snapshot-move-data enabled there is a large number of datamovers started after the initial resource backup finishes.

Some of our applications have upwards of 100 PVCs resulting in nearly a hundred datamover pods.

The existing design model creates the datamovers but only lets them perform the backup up to the loadConcurrency configuration in parallel. While the waiting datamover pods use little in actual resources, the resource limits has to be set to what is required to backing up the largest in resource usage and skews scheduling of other workloads.

The loadConcurrency cannot be increased beyond a certain point or kopia begins to run out memory from the nodes.

This example debug logs illustrate a small example. When loadConcurrency is constrained to 1 as well and the loadAffinity to a single node with 10 PVCs. The result is 10 datamovers only one of which executes.

bundle-2024-10-23-15-47-30.tar.gz

Describe the solution you'd like
Do not deploy datamover pods until load concurrency policy allows datamovers to run.

Anything else you would like to add:

Proposed Solution:

New state: DatamoverDeploying

Instead of deploying the datamover during Prepared. Move the concurrency check and the datamover pod creation into this state.

To move from Accepted -> Prepared:
Largely remains unchanged but without the datamover pod creation.

Copy the VolumeSnapshot, VolumeSnapshotContents, and create the PVC. On restore, the changes are just create the PVC from the application PVC spec. Do not create the datamover container.

State change:
Prepared -> DatamoverDeploying:

The concurrency check occurs on this state. On success, create the datamover Pod.

DatamoverDeploying -> Prepared
The DataUpload should move into the Prepared state when datamover pod has phase Running as is.

Environment:

Velero version (use velero version):

Client:
Version: main
Git commit: c53ab20-dirty
Server:
Version: main

Only changes are to Makefile and modify Dockerfile.ubi from github.com/openshift/velero to make a velero container compatible with OpenShift. Dockerfile from Velero will not work with OpenShift due permissions denied creating folders. I am aware of my less than stellar reputation. I do not have access to generic Kubernetes at all nor allowed to deploy generic Kubernetes. I have no choice but to use an alternate Dockerfile to get Velero in a container without Red Hat OADP additions and internal modifications.

A copy of the changes to Makefile and added Dockerfile.ubi are available at my branch at https://github.com/msfrucht/openshift-velero/tree/velero_in_openshift

Kubernetes version (use kubectl version):

Client Version: 4.15.12
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.16.8
Kubernetes Version: v1.29.7+4510e9c

Kubernetes installer & version: Red Hat OpenShift
Cloud provider or hardware configuration: on-premise Red Hat Hypershift virtualized cluster
OS (e.g. from /etc/os-release): Red Hat Enterprise Linux CoreOS 416.94.202408132101-0"

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "The project would be better with this feature added"
👎 for "This feature will not enhance the project in a meaningful way"

The text was updated successfully, but these errors were encountered:

Lyndon-Li · 2024-10-24T02:52:44Z

This is indeed a problem.
Let me further think about this. The goal is to let go the resource that should be occupied by cpu/memory limit, so we may use a loose control and avoid the heavy changes to the existing workflow.

Lyndon-Li · 2024-10-24T05:37:58Z

Some more inputs:
By default, backupPod/restorePod is using BestEfforts resource policy, that is, no request or limit is set. Therefore, the thing that impact Kubernetes scheduler is the number of backupPods/restorePods only.
Though the impact is also obvious when the number is large, it won't shut pods out as same as the resource request/limit impact.

Therefore, the current issue could be alleviated somehow if users follow the recommendation --- unless necessary, the resource configuration should not be set.

On the other hand, it is not a bad idea to have some overflow for the number of backupPod against loadConcurrency --- the snapshot expose also takes considerable time, it helps to improve the pipeline if we have several backupPods doing the snapshot expose while some others within the loadConcurrency running VGDP.

Therefore, a well-designed strategy which considers node-selection, node-concurrency may make the ultimate solution.

msfrucht · 2024-10-24T17:14:45Z

Yes, I noticed. Which is why I didn't say it blocked scheduling, it only skewed it.

The existing state model allows datamovers to remain in state Pending without issue when pod limits and node limits are hit. I've rarely seen anyone go through and set priority options. And developers tend not to want to set a default due to user priority will vary.

Anything with priority went ahead over datamover Pods in the large applications. Few people bother to set Pod priorities or use PriorityClass (k8s 1.24 required). Tends to be the bottom of the pile for options to implement. And no one wants to go through and set priority for them all, especially in shared environments where admin may not even know what the applications are.

Setting a node-agent-config podResources is absolutely required for something resembling sane scheduling right now to get Burstable or Guaranteed QoS. Once kopia starts the memory usage of the datamover will go from a few MBs to sometimes GBs depending on workload and changes since last backups. Using BestEffort with that behavior is just a recipe for evictions.

Currently testing with user-created evictions with kubectl evict-pod. Clearing out a hundred or so datamover pods is not desirable from failed state progression.

msfrucht · 2024-11-01T20:43:16Z

@Lyndon-Li I've thought of another thing that might help with scheduling besides PriorityClass and priority options.

Currently the node-agent-config podResources have to be set sufficiently high to account for the largest workload in the backup or restore and get Burstable or Guaranteed QoS.

Being able to set resource profiles for sets of volume would make scheduling less disruptive in Burstable and Guaranteed QoS by setting resource profiles more accurately.

This was already an issue with the node-agent backups in Velero 1.14 but you can't exactly swap out the resource value in the middle of a running Pod. Datamover Pods don't have that inherent restriction.

Lyndon-Li · 2024-11-04T02:33:02Z

Currently the node-agent-config podResources have to be set sufficiently high

Actually, the default behavior is BestEfforts, that is no request/limit is set. And this is also the recommended behavior. The thing is, once users allow a node to run data mover pods, it means they accept the impact of the data mover pods, including cpu, memory, network bandwidth, etc., after all, data mover activities are source consuming inevitable.

Being able to set resource profiles for sets of volume would make scheduling less disruptive

Could you share an example of how would the profiles are set? Are you talking about the case when the resource policy is not BestEfforts?

Lyndon-Li self-assigned this Oct 24, 2024

Lyndon-Li added the area/datamover label Oct 24, 2024

msfrucht changed the title ~~Reduce excessive number of running datamovers to loadConcurrency values~~ Reduce excessive number of running datamovers relative to loadConcurrency values Oct 24, 2024

reasonerjt added kind/requirement Needs Design backlog labels Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce excessive number of running datamovers relative to loadConcurrency values #8344

Reduce excessive number of running datamovers relative to loadConcurrency values #8344

msfrucht commented Oct 23, 2024 •

edited

Loading

Lyndon-Li commented Oct 24, 2024

Lyndon-Li commented Oct 24, 2024

msfrucht commented Oct 24, 2024

msfrucht commented Nov 1, 2024

Lyndon-Li commented Nov 4, 2024

Reduce excessive number of running datamovers relative to loadConcurrency values #8344

Reduce excessive number of running datamovers relative to loadConcurrency values #8344

Comments

msfrucht commented Oct 23, 2024 • edited Loading

Lyndon-Li commented Oct 24, 2024

Lyndon-Li commented Oct 24, 2024

msfrucht commented Oct 24, 2024

msfrucht commented Nov 1, 2024

Lyndon-Li commented Nov 4, 2024

msfrucht commented Oct 23, 2024 •

edited

Loading