clickhouse-backup

Tool for easy ClickHouse backup and restore with cloud storages support

Features

Easy creating and restoring backups of all or specific tables
Efficient storing of multiple backups on the file system
Uploading and downloading with streaming compression
Works with AWS, GCS, Azure, Tencent COS, FTP, SFTP
Support of Atomic Database Engine
Support of multi disks installations
Support of incremental backups on remote storages

TODO:

Smart restore for replicated tables

Limitations

ClickHouse above 1.1.54390 is supported
Only MergeTree family tables engines

Installation

Download the latest binary from the releases page and decompress with:

tar -zxvf clickhouse-backup.tar.gz

Use the official tiny Docker image and run it on host where installed clickhouse-server:

docker run -u $(id -u clickhouse) --rm -it --network host -v "/var/lib/clickhouse:/var/lib/clickhouse" \
   -e CLICKHOUSE_PASSWORD="password" \
   -e S3_BUCKET="clickhouse-backup" \
   -e S3_ACCESS_KEY="access_key" \
   -e S3_SECRET_KEY="secret" \
   alexakulov/clickhouse-backup --help

Build from the sources:

GO111MODULE=on go get github.com/AlexAkulov/clickhouse-backup/cmd/clickhouse-backup

Usage

NAME:
   clickhouse-backup - Tool for easy backup of ClickHouse with cloud support

USAGE:
   clickhouse-backup <command> [-t, --tables=<db>.<table>] <backup_name>

VERSION:
   1.0.0

DESCRIPTION:
   Run as 'root' or 'clickhouse' user

COMMANDS:
   tables          Print list of tables
   create          Create new backup
   create_remote   Create and upload
   upload          Upload backup to remote storage
   list            Print list of backups
   download        Download backup from remote storage
   restore         Create schema and restore data from backup
   restore_remote  Download and restore
   delete          Delete specific backup
   default-config  Print default config
   server          Run API server
   help, h         Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --config FILE, -c FILE  Config FILE name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]
   --help, -h              show help
   --version, -v           print the version

Default Config

Config file location can be defined by $CLICKHOUSE_BACKUP_CONFIG

All options can be overwritten via environment variables

general:
  remote_storage: none           # REMOTE_STORAGE
  max_file_size: 107374182400    # MAX_FILE_SIZE
  disable_progress_bar: false    # DISABLE_PROGRESS_BAR
  backups_to_keep_local: 0       # BACKUPS_TO_KEEP_LOCAL
  backups_to_keep_remote: 0      # BACKUPS_TO_KEEP_REMOTE
  log_level: info                # LOG_LEVEL
  allow_empty_backups: false     # ALLOW_EMPTY_BACKUPS
  download_concurrency: 1        # DOWNLOAD_CONCURRENCY, max 255
  upload_concurrency: 1          # UPLOAD_CONCURRENCY, max 255
clickhouse:
  username: default                # CLICKHOUSE_USERNAME
  password: ""                     # CLICKHOUSE_PASSWORD
  host: localhost                  # CLICKHOUSE_HOST
  port: 9000                       # CLICKHOUSE_PORT
  disk_mapping: {}                 # CLICKHOUSE_DISK_MAPPING
  skip_tables:                     # CLICKHOUSE_SKIP_TABLES
    - system.*
  timeout: 5m                      # CLICKHOUSE_TIMEOUT
  freeze_by_part: false            # CLICKHOUSE_FREEZE_BY_PART
  secure: false                    # CLICKHOUSE_SECURE
  skip_verify: false               # CLICKHOUSE_SKIP_VERIFY
  sync_replicated_tables: true     # CLICKHOUSE_SYNC_REPLICATED_TABLES
  log_sql_queries: true            # CLICKHOUSE_LOG_SQL_QUERIES
  debug: false                     # CLICKHOUSE_DEBUG
  config_dir:      "/etc/clickhouse-server"              # CLICKHOUSE_CONFIG_DIR
  restart_command: "systemctl restart clickhouse-server" # CLICKHOUSE_RESTART_COMMAND
azblob:
  endpoint_suffix: "core.windows.net" # AZBLOB_ENDPOINT_SUFFIX
  account_name: ""             # AZBLOB_ACCOUNT_NAME
  account_key: ""              # AZBLOB_ACCOUNT_KEY
  sas: ""                      # AZBLOB_SAS
  use_managed_identity: false  # AZBLOB_USE_MANAGED_IDENTITY
  container: ""                # AZBLOB_CONTAINER
  path: ""                     # AZBLOB_PATH
  compression_level: 1         # AZBLOB_COMPRESSION_LEVEL
  compression_format: tar      # AZBLOB_COMPRESSION_FORMAT
  sse_key: ""                  # AZBLOB_SSE_KEY
s3:
  access_key: ""                   # S3_ACCESS_KEY
  secret_key: ""                   # S3_SECRET_KEY
  bucket: ""                       # S3_BUCKET
  endpoint: ""                     # S3_ENDPOINT
  region: us-east-1                # S3_REGION
  acl: private                     # S3_ACL
  assume_role_arn: ""              # S3_ASSUME_ROLE_ARN
  force_path_style: false          # S3_FORCE_PATH_STYLE
  path: ""                         # S3_PATH
  disable_ssl: false               # S3_DISABLE_SSL
  compression_level: 1             # S3_COMPRESSION_LEVEL
  # supports 'tar', 'gzip', 'zstd', 'brotli'
  compression_format: tar          # S3_COMPRESSION_FORMAT
  # empty (default), AES256, or aws:kms
  sse: AES256                      # S3_SSE
  disable_cert_verification: false # S3_DISABLE_CERT_VERIFICATION
  storage_class: STANDARD          # S3_STORAGE_CLASS
  concurrency: 1                   # S3_CONCURRENCY
  # if less or eq 0 then calculated as max_file_size / 10000 
  part_size: 0                     # S3_PART_SIZE
  debug: false                     # S3_DEBUG
gcs:
  credentials_file: ""         # GCS_CREDENTIALS_FILE
  credentials_json: ""         # GCS_CREDENTIALS_JSON
  bucket: ""                   # GCS_BUCKET
  path: ""                     # GCS_PATH
  compression_level: 1         # GCS_COMPRESSION_LEVEL
  compression_format: tar      # GCS_COMPRESSION_FORMAT
  debug: false                 # GCS_DEBUG
cos:
  url: ""                      # COS_URL
  timeout: 2m                  # COS_TIMEOUT
  secret_id: ""                # COS_SECRET_ID
  secret_key: ""               # COS_SECRET_KEY
  path: ""                     # COS_PATH
  compression_format: tar      # COS_COMPRESSION_FORMAT
  compression_level: 1         # COS_COMPRESSION_LEVEL
api:
  listen: "localhost:7171"     # API_LISTEN
  enable_metrics: true         # API_ENABLE_METRICS
  enable_pprof: false          # API_ENABLE_PPROF
  username: ""                 # API_USERNAME
  password: ""                 # API_PASSWORD
  secure: false                # API_SECURE
  certificate_file: ""         # API_CERTIFICATE_FILE
  private_key_file: ""         # API_PRIVATE_KEY_FILE
  create_integration_tables: false # API_CREATE_INTEGRATION_TABLES
ftp:
  address: ""                  # FTP_ADDRESS
  timeout: 2m                  # FTP_TIMEOUT
  username: ""                 # FTP_USERNAME
  password: ""                 # FTP_PASSWORD
  tls: false                   # FTP_TLS
  path: ""                     # FTP_PATH
  compression_format: tar      # FTP_COMPRESSION_FORMAT
  compression_level: 1         # FTP_COMPRESSION_LEVEL
  debug: false                 # FTP_DEBUG
sftp:
  address: ""                  # SFTP_ADDRESS
  username: ""                 # SFTP_USERNAME
  password: ""                 # SFTP_PASSWORD
  key: ""                      # SFTP_KEY
  path: ""                     # SFTP_PATH
  concurrency: 1               # SFTP_CONCURRENCY     
  compression_format: tar      # SFTP_COMPRESSION_FORMAT
  compression_level: 1         # SFTP_COMPRESSION_LEVEL

ATTENTION!

Never change files permissions in /var/lib/clickhouse/backup. This path contains hard links. Permissions on all hard links to the same data on disk are always identical. That means that if you change the permissions/owner/attributes on a hard link in backup path, permissions on files with which ClickHouse works will be changed too. That might lead to data corruption.

API

Use the clickhouse-backup server command to run as a REST API server. In general, the API attempts to mirror the CLI commands.

GET /backup/tables

Print list of tables: curl -s localhost:7171/backup/tables | jq .

POST /backup/create

Create new backup: curl -s localhost:7171/backup/create -X POST | jq .

Optional query argument table works the same as the --table value CLI argument.
Optional query argument name works the same as specifying a backup name with the CLI.
Optional query argument schema works the same the --schema CLI argument (backup schema only).
Optional query argument rbac works the same the --rbac CLI argument (backup RBAC).
Optional query argument configs works the same the --configs CLI argument (backup configs).
Full example: curl -s 'localhost:7171/backup/create?table=default.billing&name=billing_test' -X POST

Note: this operation is async, so the API will return once the operation has been started.

POST /backup/upload

Upload backup to remote storage: curl -s localhost:7171/backup/upload/<BACKUP_NAME> -X POST | jq .

Optional query argument diff-from works the same as the --diff-from CLI argument.

Note: this operation is async, so the API will return once the operation has been started.

GET /backup/list/{where}

Print list of backups: curl -s localhost:7171/backup/list | jq . Print list only local backups: curl -s localhost:7171/backup/list/local | jq . Print list only remote backups: curl -s localhost:7171/backup/list/remote | jq .

Note: The Size field is not populated for local backups.

POST /backup/download

Download backup from remote storage: curl -s localhost:7171/backup/download/<BACKUP_NAME> -X POST | jq .

Note: this operation is async, so the API will return once the operation has been started.

POST /backup/restore

Create schema and restore data from backup: curl -s localhost:7171/backup/restore/<BACKUP_NAME> -X POST | jq .

Optional query argument table works the same as the --table value CLI argument.
Optional query argument schema works the same the --schema CLI argument (restore schema only).
Optional query argument data works the same the --data CLI argument (restore data only).
Optional query argument rbac works the same the --rbac CLI argument (restore RBAC).
Optional query argument configs works the same the --configs CLI argument (restore configs).

POST /backup/delete

Delete specific remote backup: curl -s localhost:7171/backup/delete/remote/<BACKUP_NAME> -X POST | jq .

Delete specific local backup: curl -s localhost:7171/backup/delete/local/<BACKUP_NAME> -X POST | jq .

GET /backup/status

Display list of current running async operation: curl -s localhost:7171/backup/status | jq .

POST /backup/actions

Execute multiple backup actions: curl -X POST -d '{"command":"create test_backup"}' -s localhost:7171/backup/actions

GET /backup/actions

Display list of current async operations: curl -s localhost:7171/backup/actions | jq .

Optional query argument filter could filter actions on server side.
Optional query argument last could filter show only last XX actions.

Storages

S3

In order to make backups to S3, the following permissions shall be set:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "clickhouse-backup-s3-access-to-files",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::BUCKET_NAME/*"
        },
        {
            "Sid": "clickhouse-backup-s3-access-to-bucket",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::BUCKET_NAME"
        }
    ]
}

Examples

Simple cron script for daily backup and uploading

#!/bin/bash
BACKUP_NAME=my_backup_$(date -u +%Y-%m-%dT%H-%M-%S)
clickhouse-backup create $BACKUP_NAME >> /var/log/clickhouse-backup.log
if [[ $? != 0 ]]; then
  echo "clickhouse-backup create $BACKUP_NAME FAILED and return $? exit code"
fi

clickhouse-backup upload $BACKUP_NAME >> /var/log/clickhouse-backup.log
if [[ $? != 0 ]]; then
  echo "clickhouse-backup upload $BACKUP_NAME FAILED and return $? exit code"
fi

Name		Name	Last commit message	Last commit date
Latest commit History 472 Commits
.github/workflows		.github/workflows
cmd/clickhouse-backup		cmd/clickhouse-backup
config		config
internal		internal
pkg		pkg
test/integration		test/integration
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
ChangeLog.md		ChangeLog.md
Dockerfile		Dockerfile
Examples.md		Examples.md
LICENSE		LICENSE
Makefile		Makefile
ReadMe.md		ReadMe.md
Vagrantfile		Vagrantfile
entrypoint.sh		entrypoint.sh
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

clickhouse-backup

Features

Limitations

Installation

Usage

Default Config

ATTENTION!

API

Storages

S3

Examples

Simple cron script for daily backup and uploading

More use cases of clickhouse-backup

About

Uh oh!

Releases

Packages

Languages

License

hackstrap/clickhouse-backup

Folders and files

Latest commit

History

Repository files navigation

clickhouse-backup

Features

Limitations

Installation

Usage

Default Config

ATTENTION!

API

Storages

S3

Examples

Simple cron script for daily backup and uploading

More use cases of clickhouse-backup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages