Skip to content

Conversation

@Allda
Copy link
Collaborator

@Allda Allda commented Oct 17, 2025

A new backup controller orchestrates a backup process for workspace PVC. A new configuration option is added to DevWorkspaceOperatorConfig that enables running regular cronjob that is responsible for backup mechanism. The job executes following steps:

  • Find a workspaces
  • Finds out that workspace has been recently stopped
  • Detect a workspace PVC
  • Execute a job in the same namespace that does the backup

The last step is currently not fully implemented as it requires running a buildah inside the container and it will be delivered as a separate feature.

Issue: eclipse-che/che#23570

What does this PR do?

What issues does this PR fix or reference?

Is it tested? How?

The feature has been tested locally and using integration tests. Following configuration should be added to the config to enable this feature:

config:                                                                         
  workspace:                                                                    
    backupCronJob:                                                              
      enable: true                                                              
      registry: kind-registry:5000/backup                                       
      schedule: '* * * * *'

After a config is added, stop any workspace and wait till a backup job is created.

$ kubectl get jobs
devworkspace-backup-2l679   Running    0/1           138m       138m
devworkspace-backup-2xvgl   Running    0/1           139m       139m
devworkspace-backup-45vxb   Running    0/1           145m       145m

The job creates a backup and push image to registry

+ set -e
+ exec /workspace-recovery.sh --backup
+ set -e
+ for i in "$@"
+ case $i in
+ backup
+ BACKUP_IMAGE=kind-registry:5000/backup/backup-default-common-pvc-test:latest
++ buildah from scratch
+ NEW_IMAGE=working-container
+ buildah copy working-container /workspace/workspacedfd9f53065ea452c//projects /
f099c09f924cf051a01d78cd34ca87a4c161d7c217df5ac627e90e66926fbe9f
+ buildah config --label DEVWORKSPACE=common-pvc-test working-container
+ buildah config --label NAMESPACE=default working-container
+ buildah commit working-container kind-registry:5000/backup/backup-default-common-pvc-test:latest
Getting image source signatures
Copying blob sha256:137b2a0909654325b7eff0a9dfe623e5abdc685c4d6ad8e4c8d163e0984cf805
Copying config sha256:86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
Writing manifest to image destination
86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
+ buildah umount working-container
+ buildah push --tls-verify=false kind-registry:5000/backup/backup-default-common-pvc-test:latest
Getting image source signatures
Copying blob sha256:137b2a0909654325b7eff0a9dfe623e5abdc685c4d6ad8e4c8d163e0984cf805
Copying config sha256:86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
Writing manifest to image destination
stream closed: EOF for default/devworkspace-backup-zjzk5-82psq (backup-workspace)

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

@openshift-ci
Copy link

openshift-ci bot commented Oct 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Allda
Once this PR has been reviewed and has the lgtm label, please assign dkwon17 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

A new backup controller orchestrates a backup process for workspace PVC.
A new configuration option is added to DevWorkspaceOperatorConfig that
enables running regular cronjob that is responsible for backup
mechanism. The job executes following steps:
- Find a workspaces
- Finds out that workspace has been recently stopped
- Detect a workspace PVC
- Execute a job in the same namespace that does the backup

The last step is currently not fully implemented as it requires running
a buildah inside the container and it will be delivered as a separate
feature.

Issue: eclipse-che/che#23570

Signed-off-by: Ales Raszka <[email protected]>
@rohanKanojia
Copy link
Member

@Allda : Really appreciate you taking the time to contribute this in such a short time. 🎉

Could you please also fill out the “Is it tested? How?” section in the PR template? It’ll help reviewers and future contributors verify the change more easily.

Thanks again for your effort! 🙌

@rohanKanojia
Copy link
Member

I tested this PR and it seems to work.

  1. Created DevWorkspaceOperatorConfig with this BackupCronJobConfig (backup every 3 minutes)
config:
  workspace:
    backupCronJob:
      enable: true
      schedule: "*/3 * * * *"
  1. Created a DevWorkspace and wait for it to get running
  2. Stopped workspace
  3. Controller detected stopped workspace and started creating jobs for backups:
NAME               STATUS    COMPLETIONS   DURATION   AGE
backup-job-8tnsp   Running   0/1                      0s
backup-job-8tnsp   Running   0/1           0s         0s
backup-job-8tnsp   Running   0/1           16s        16s
backup-job-8tnsp   Running   0/1           17s        17s
backup-job-8tnsp   Running   0/1           18s        18s
backup-job-8tnsp   Complete   1/1           18s        18s
backup-job-kc8rm   Running    0/1                      0s
backup-job-kc8rm   Running    0/1           0s         0s
backup-job-kc8rm   Running    0/1           6s         6s
backup-job-kc8rm   Running    0/1           7s         7s
backup-job-kc8rm   Running    0/1           8s         8s
backup-job-kc8rm   Complete   1/1           8s         8s

@Allda Allda force-pushed the 23570 branch 2 times, most recently from d93f34e to 0bc74b1 Compare October 29, 2025 10:22
Allda added 2 commits October 29, 2025 11:24
A backup of workspace is done using Buildah and storing a content of the
workspace PVC into a container image. The image is later stored in a
registry and can be used to recover data.

A prototype script was updated and stored under project-backup
directory and is build alongside the controller.

The backup job calls the script and execute following steps:
- mount a volume with workspace data
- build container image using buildah
- push image to registry configured by the operator admin

Signed-off-by: Ales Raszka <[email protected]>
A new sub-object was added to the operator config that reflect a current
status of the backup controller and stores a last time the backup was
executed. This value is used to determine whether a backup of the
workspace is needed or if it already has been executed.

Signed-off-by: Ales Raszka <[email protected]>
A backup job use a PVC name from a default value or from the config if
user configured custom name.

Signed-off-by: Ales Raszka <[email protected]>
@Allda
Copy link
Collaborator Author

Allda commented Oct 29, 2025

/retest

export DWO_BUNDLE_IMG ?= quay.io/devfile/devworkspace-operator-bundle:next
export DWO_INDEX_IMG ?= quay.io/devfile/devworkspace-operator-index:next
export PROJECT_CLONE_IMG ?= quay.io/devfile/project-clone:next
export PROJECT_BACKUP_IMG ?= quay.io/devfile/project-clone:next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export PROJECT_BACKUP_IMG ?= quay.io/devfile/project-clone:next
export PROJECT_BACKUP_IMG ?= quay.io/devfile/project-backup:next

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it set incorrectly also for some other places.

push: true
platforms: linux/amd64, linux/arm64, linux/ppc64le, linux/s390x
tags: |
quay.io/devfile/project-backup:next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repo quay.io/devfile/project-backup doesn't not exist yet

// A registry where backup images are stored. Images are stored
// in {registry}/backup-${DEVWORKSPACE_NAMESPACE}-${DEVWORKSPACE_NAME}
// +kubebuilder:validation:Optional
Registry string `json:"registry,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if registry is not public and requires authentication and/or certificate to access.

return err
}

job := &batchv1.Job{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe we need a dedicated SA for this job and delegate only required permissions.
@dkwon17 ?

SecurityContext: &corev1.SecurityContext{
RunAsUser: ptrInt64(0),
Capabilities: &corev1.Capabilities{
Add: []corev1.Capability{"SYS_ADMIN", "SYS_CHROOT"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any particular reasons to have those capabilities?

backUpConfig := dwOperatorConfig.Config.Workspace.BackupCronJob

// Find a PVC with the name "claim-devworkspace" or based on the name from the operator config
pvcName := "claim-devworkspace"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PVC name will not always be claim-devworkspace,

There are two main types of storage strategies for DevWorkspaces, common (or, per-user), and per-workspace

Here are some more details about the storage strategies: https://eclipse.dev/che/docs/stable/administration-guide/configuring-the-storage-strategy/

For common, the default PVC name is claim-devworkspace, and for per-workspace, the PVC name is storage-<devworkspaceid>

I suggest using for example GetProvisioner to help determine the storage policy,

and to determine the PVC name, the code is currently determining that like so:

usingAlternatePVC, pvcName, err := checkForAlternatePVC(workspace.Namespace, clusterAPI)
if err != nil {
return err
}
if pvcName == "" {
pvcName = workspace.Config.Workspace.PVCName
}

perWorkspacePVC, err := syncPerWorkspacePVC(workspace, clusterAPI)
if err != nil {
return err
}
pvcName := perWorkspacePVC.Name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants