Skip to content

Tool for easy ClickHouse backup and restore with cloud storages support

License

Notifications You must be signed in to change notification settings

hackstrap/clickhouse-backup

 
 

Repository files navigation

clickhouse-backup

Build GoDoc Telegram Docker Image

Tool for easy ClickHouse backup and restore with cloud storages support

Features

  • Easy creating and restoring backups of all or specific tables
  • Efficient storing of multiple backups on the file system
  • Uploading and downloading with streaming compression
  • Works with AWS, GCS, Azure, Tencent COS, FTP, SFTP
  • Support of Atomic Database Engine
  • Support of multi disks installations
  • Support of incremental backups on remote storages

TODO:

  • Smart restore for replicated tables

Limitations

  • ClickHouse above 1.1.54390 is supported
  • Only MergeTree family tables engines

Installation

  • Download the latest binary from the releases page and decompress with:
tar -zxvf clickhouse-backup.tar.gz
  • Use the official tiny Docker image and run it on host where installed clickhouse-server:
docker run -u $(id -u clickhouse) --rm -it --network host -v "/var/lib/clickhouse:/var/lib/clickhouse" \
   -e CLICKHOUSE_PASSWORD="password" \
   -e S3_BUCKET="clickhouse-backup" \
   -e S3_ACCESS_KEY="access_key" \
   -e S3_SECRET_KEY="secret" \
   alexakulov/clickhouse-backup --help
  • Build from the sources:
GO111MODULE=on go get github.com/AlexAkulov/clickhouse-backup/cmd/clickhouse-backup

Usage

NAME:
   clickhouse-backup - Tool for easy backup of ClickHouse with cloud support

USAGE:
   clickhouse-backup <command> [-t, --tables=<db>.<table>] <backup_name>

VERSION:
   1.0.0

DESCRIPTION:
   Run as 'root' or 'clickhouse' user

COMMANDS:
   tables          Print list of tables
   create          Create new backup
   create_remote   Create and upload
   upload          Upload backup to remote storage
   list            Print list of backups
   download        Download backup from remote storage
   restore         Create schema and restore data from backup
   restore_remote  Download and restore
   delete          Delete specific backup
   default-config  Print default config
   server          Run API server
   help, h         Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --config FILE, -c FILE  Config FILE name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]
   --help, -h              show help
   --version, -v           print the version

Default Config

Config file location can be defined by $CLICKHOUSE_BACKUP_CONFIG

All options can be overwritten via environment variables

general:
  remote_storage: none           # REMOTE_STORAGE
  max_file_size: 107374182400    # MAX_FILE_SIZE
  disable_progress_bar: false    # DISABLE_PROGRESS_BAR
  backups_to_keep_local: 0       # BACKUPS_TO_KEEP_LOCAL
  backups_to_keep_remote: 0      # BACKUPS_TO_KEEP_REMOTE
  log_level: info                # LOG_LEVEL
  allow_empty_backups: false     # ALLOW_EMPTY_BACKUPS
  download_concurrency: 1        # DOWNLOAD_CONCURRENCY, max 255
  upload_concurrency: 1          # UPLOAD_CONCURRENCY, max 255
clickhouse:
  username: default                # CLICKHOUSE_USERNAME
  password: ""                     # CLICKHOUSE_PASSWORD
  host: localhost                  # CLICKHOUSE_HOST
  port: 9000                       # CLICKHOUSE_PORT
  disk_mapping: {}                 # CLICKHOUSE_DISK_MAPPING
  skip_tables:                     # CLICKHOUSE_SKIP_TABLES
    - system.*
  timeout: 5m                      # CLICKHOUSE_TIMEOUT
  freeze_by_part: false            # CLICKHOUSE_FREEZE_BY_PART
  secure: false                    # CLICKHOUSE_SECURE
  skip_verify: false               # CLICKHOUSE_SKIP_VERIFY
  sync_replicated_tables: true     # CLICKHOUSE_SYNC_REPLICATED_TABLES
  log_sql_queries: true            # CLICKHOUSE_LOG_SQL_QUERIES
  debug: false                     # CLICKHOUSE_DEBUG
  config_dir:      "/etc/clickhouse-server"              # CLICKHOUSE_CONFIG_DIR
  restart_command: "systemctl restart clickhouse-server" # CLICKHOUSE_RESTART_COMMAND
azblob:
  endpoint_suffix: "core.windows.net" # AZBLOB_ENDPOINT_SUFFIX
  account_name: ""             # AZBLOB_ACCOUNT_NAME
  account_key: ""              # AZBLOB_ACCOUNT_KEY
  sas: ""                      # AZBLOB_SAS
  use_managed_identity: false  # AZBLOB_USE_MANAGED_IDENTITY
  container: ""                # AZBLOB_CONTAINER
  path: ""                     # AZBLOB_PATH
  compression_level: 1         # AZBLOB_COMPRESSION_LEVEL
  compression_format: tar      # AZBLOB_COMPRESSION_FORMAT
  sse_key: ""                  # AZBLOB_SSE_KEY
s3:
  access_key: ""                   # S3_ACCESS_KEY
  secret_key: ""                   # S3_SECRET_KEY
  bucket: ""                       # S3_BUCKET
  endpoint: ""                     # S3_ENDPOINT
  region: us-east-1                # S3_REGION
  acl: private                     # S3_ACL
  assume_role_arn: ""              # S3_ASSUME_ROLE_ARN
  force_path_style: false          # S3_FORCE_PATH_STYLE
  path: ""                         # S3_PATH
  disable_ssl: false               # S3_DISABLE_SSL
  compression_level: 1             # S3_COMPRESSION_LEVEL
  # supports 'tar', 'gzip', 'zstd', 'brotli'
  compression_format: tar          # S3_COMPRESSION_FORMAT
  # empty (default), AES256, or aws:kms
  sse: AES256                      # S3_SSE
  disable_cert_verification: false # S3_DISABLE_CERT_VERIFICATION
  storage_class: STANDARD          # S3_STORAGE_CLASS
  concurrency: 1                   # S3_CONCURRENCY
  # if less or eq 0 then calculated as max_file_size / 10000 
  part_size: 0                     # S3_PART_SIZE
  debug: false                     # S3_DEBUG
gcs:
  credentials_file: ""         # GCS_CREDENTIALS_FILE
  credentials_json: ""         # GCS_CREDENTIALS_JSON
  bucket: ""                   # GCS_BUCKET
  path: ""                     # GCS_PATH
  compression_level: 1         # GCS_COMPRESSION_LEVEL
  compression_format: tar      # GCS_COMPRESSION_FORMAT
  debug: false                 # GCS_DEBUG
cos:
  url: ""                      # COS_URL
  timeout: 2m                  # COS_TIMEOUT
  secret_id: ""                # COS_SECRET_ID
  secret_key: ""               # COS_SECRET_KEY
  path: ""                     # COS_PATH
  compression_format: tar      # COS_COMPRESSION_FORMAT
  compression_level: 1         # COS_COMPRESSION_LEVEL
api:
  listen: "localhost:7171"     # API_LISTEN
  enable_metrics: true         # API_ENABLE_METRICS
  enable_pprof: false          # API_ENABLE_PPROF
  username: ""                 # API_USERNAME
  password: ""                 # API_PASSWORD
  secure: false                # API_SECURE
  certificate_file: ""         # API_CERTIFICATE_FILE
  private_key_file: ""         # API_PRIVATE_KEY_FILE
  create_integration_tables: false # API_CREATE_INTEGRATION_TABLES
ftp:
  address: ""                  # FTP_ADDRESS
  timeout: 2m                  # FTP_TIMEOUT
  username: ""                 # FTP_USERNAME
  password: ""                 # FTP_PASSWORD
  tls: false                   # FTP_TLS
  path: ""                     # FTP_PATH
  compression_format: tar      # FTP_COMPRESSION_FORMAT
  compression_level: 1         # FTP_COMPRESSION_LEVEL
  debug: false                 # FTP_DEBUG
sftp:
  address: ""                  # SFTP_ADDRESS
  username: ""                 # SFTP_USERNAME
  password: ""                 # SFTP_PASSWORD
  key: ""                      # SFTP_KEY
  path: ""                     # SFTP_PATH
  concurrency: 1               # SFTP_CONCURRENCY     
  compression_format: tar      # SFTP_COMPRESSION_FORMAT
  compression_level: 1         # SFTP_COMPRESSION_LEVEL

ATTENTION!

Never change files permissions in /var/lib/clickhouse/backup. This path contains hard links. Permissions on all hard links to the same data on disk are always identical. That means that if you change the permissions/owner/attributes on a hard link in backup path, permissions on files with which ClickHouse works will be changed too. That might lead to data corruption.

API

Use the clickhouse-backup server command to run as a REST API server. In general, the API attempts to mirror the CLI commands.

GET /backup/tables

Print list of tables: curl -s localhost:7171/backup/tables | jq .

POST /backup/create

Create new backup: curl -s localhost:7171/backup/create -X POST | jq .

  • Optional query argument table works the same as the --table value CLI argument.
  • Optional query argument name works the same as specifying a backup name with the CLI.
  • Optional query argument schema works the same the --schema CLI argument (backup schema only).
  • Optional query argument rbac works the same the --rbac CLI argument (backup RBAC).
  • Optional query argument configs works the same the --configs CLI argument (backup configs).
  • Full example: curl -s 'localhost:7171/backup/create?table=default.billing&name=billing_test' -X POST

Note: this operation is async, so the API will return once the operation has been started.

POST /backup/upload

Upload backup to remote storage: curl -s localhost:7171/backup/upload/<BACKUP_NAME> -X POST | jq .

  • Optional query argument diff-from works the same as the --diff-from CLI argument.

Note: this operation is async, so the API will return once the operation has been started.

GET /backup/list/{where}

Print list of backups: curl -s localhost:7171/backup/list | jq . Print list only local backups: curl -s localhost:7171/backup/list/local | jq . Print list only remote backups: curl -s localhost:7171/backup/list/remote | jq .

Note: The Size field is not populated for local backups.

POST /backup/download

Download backup from remote storage: curl -s localhost:7171/backup/download/<BACKUP_NAME> -X POST | jq .

Note: this operation is async, so the API will return once the operation has been started.

POST /backup/restore

Create schema and restore data from backup: curl -s localhost:7171/backup/restore/<BACKUP_NAME> -X POST | jq .

  • Optional query argument table works the same as the --table value CLI argument.
  • Optional query argument schema works the same the --schema CLI argument (restore schema only).
  • Optional query argument data works the same the --data CLI argument (restore data only).
  • Optional query argument rbac works the same the --rbac CLI argument (restore RBAC).
  • Optional query argument configs works the same the --configs CLI argument (restore configs).

POST /backup/delete

Delete specific remote backup: curl -s localhost:7171/backup/delete/remote/<BACKUP_NAME> -X POST | jq .

Delete specific local backup: curl -s localhost:7171/backup/delete/local/<BACKUP_NAME> -X POST | jq .

GET /backup/status

Display list of current running async operation: curl -s localhost:7171/backup/status | jq .

POST /backup/actions

Execute multiple backup actions: curl -X POST -d '{"command":"create test_backup"}' -s localhost:7171/backup/actions

GET /backup/actions

Display list of current async operations: curl -s localhost:7171/backup/actions | jq .

  • Optional query argument filter could filter actions on server side.
  • Optional query argument last could filter show only last XX actions.

Storages

S3

In order to make backups to S3, the following permissions shall be set:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "clickhouse-backup-s3-access-to-files",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::BUCKET_NAME/*"
        },
        {
            "Sid": "clickhouse-backup-s3-access-to-bucket",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::BUCKET_NAME"
        }
    ]
}

Examples

Simple cron script for daily backup and uploading

#!/bin/bash
BACKUP_NAME=my_backup_$(date -u +%Y-%m-%dT%H-%M-%S)
clickhouse-backup create $BACKUP_NAME >> /var/log/clickhouse-backup.log
if [[ $? != 0 ]]; then
  echo "clickhouse-backup create $BACKUP_NAME FAILED and return $? exit code"
fi

clickhouse-backup upload $BACKUP_NAME >> /var/log/clickhouse-backup.log
if [[ $? != 0 ]]; then
  echo "clickhouse-backup upload $BACKUP_NAME FAILED and return $? exit code"
fi

More use cases of clickhouse-backup

About

Tool for easy ClickHouse backup and restore with cloud storages support

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 98.5%
  • Other 1.5%