Set of PERL
scripts to backup files to the cloud on a Linux (Un*x)
machine. Release 1.01 Jun 26 2024,
- Scan from a set location (aka base directory) with
find
to make a list of files and directories. - Decide which files need to be backed up.
- Sort the resulting list either by size, time, ...
- Group files from that list into (compressed) tar sets to fit within a given. size and not to exceed a preset number of files.
- The resulting sets (aka archives) are uploaded to a 'vault'.
- Files larger than the given size are split in parts, each smaller than that size.
- The resulting parts are uploaded.
- Info on what was uploaded is also uploaded, to have a self consistent set.
- The target location is refer to as a vault, and the tar/split/info sets uploaded as archives.
- A list of archives and an inventory are also uploaded, and copies are kept locally.
- Scratch space is needed to created the tar/split/info sets before they are uploaded and to save logs.
doBackup
is a PERL script that run all the steps, using routines defined in other*.pl
files.doCheckBackup
is a stand alone PERL script to re check the backup logs.doShowCost
estimates the storage cost from the resulting inventory.doRestore
is a PERL script to restore backed up files.
- Only the directories under the base location are processed (backed up).
- One can specify which directories, under the base location, need to be processed.
- Can upload to AWS or AZURE using their supplied CLI code:
- support AWS s3_standard, s3_glacier, s3_freezer and glacier,
- support AZ hot, cool and archive.
- Can also upload using
rclone
(only tested using Google Drive)- the CLIs
aws
,az-cli
and/orrclone
need to be configured
- the CLIs
- Can write what should be uploaded to a local disk (using the
ldisk:
'cloud'):- hence can backup to a different local disk, USB stick, etc..
- The scanning can be run in parallel for each of the directories located under the
base location:
- scanning NetApp NFS storage can be sped up using
xcp
:- to use it
xcp
must be working (installed, licensed and activated), xcp
can only be run asroot
.
- to use it
- scanning NetApp NFS storage can be sped up using
- The creation and uploading of the tar and split sets can also be run in parallel.
- Supports level 0 or incremental backups
- Configuration/customization is done via a configuration file, def.: '~/.dobackuprc`
./doBackup --help
usage: doBackup [options]
where options are
--use-cloud CLOUD cloud service to use, def.: rclone:gdrive:/Backup
where CLOUD = aws:glacier | aws:s3_glacier | aws:s3_freezer | aws:s3_standard
az:archive | az:cool | az:hot
^rclone: | ^ldisk:
--compress TYPE type of compression for archives, def.: gzip
where TYPE = none | gzip | lz4 | bzip2 | compress | lzma
--n-threads N use N threads, def.: 16
--scan-with VALUE what to use to find/scan files, def.: find
where VALUE = xcp | find
--sort-by TYPE sort the list of file by, def.: size
where TYPE = size | name | time | none
--extra-sleep N add an extra sleep before some ls (N is number of seconds, GPFS bug work around)
--level L backup level L, def.: 0
where L can be 0 to 99, or
@filename to use that file as timestamp, or
-1 to -99 for 1 to 99 days ago, or
%YYMMDD-hhmm for that date and time
--label LABEL use LABEL in vault name {bkup|frzr}-LABEL-xxx-TAG_DATE
def.: test
--tag TAG_DATE set the TAG_DATE in vault name {bkup|frzr}-label-xxx-TAG_DATE
--max-size size uncompressed archive max size [kMGT], def.: 1G
--max-count size max count in single archive [kMGT], def.: 250k
--scratch VALUE scratch directory, def.: /pool/menfin2/backup/debug
--base-dir VALUE base directory, def.: /pool/sylvain00/tmp
--use-dry-run TAG_DATE use the dry run TAG_DATE, fmt.: yymmdd-hhmm-lx
--use-vault VAULT use that vault to add archives via --limit-to
--limit-to VALUES list of subdirs to limit to
or via a filename (@filename)
--include-empty-dirs include empty directories as an archive
--no-upload do not upload the archives
--no-remove do not remove the archives
--keep-tar-lists do not delete the tar sets lists
--rclone-metadata-set add --metadata-set to rclone
--tar-cf-opts VALUES pass these options to "tar cf", def.: --ignore-failed-read --sparse
--tag-host VALUE value of tag/metadata for host, def.: hostname()
--tag-author VALUE value of tag/metadata for author, def.: username@hostdomain()
-rc | --config-file FILENAME configuration filename, def.: /home/sylvain/.dobackuprc
-n | --dry-run find the files and make the tar/split sets lists
do not upload nor build the tar/split sets
resulting lists can be used with --use-dry-run
-v | --verbose verbose mode, can be repeated to increase it
-p | --parse-only parse the args and check them only
-h | --help show this help (ignore any remaining arguments)
Ver. 1.01/3 (Jun 26 2024)
./doRestore --help
usage: doRestore [options]
where options are
--use-cloud CLOUD cloud service to use, def.: rclone:gdrive:/Backup
--use-vault VAULT vault name {bkup|frzr}-label-xxx-yyyymmdd-hhmm-lx
--scratch VALUE scratch directory, def.: /pool/backup
--out-dir OUTDIR directory where to restore def.: /home/restore
--use-dir USEDIR set directory for showing or restoring files
can be a RE when showing files,
but must be exact when restoring files
--no-remove do not remove the archives
--no-chown do not chown restored split files (when root)
--no-chgrp do not chgrp restored split files
--use-perl-re use PERL style RE for file specifications
--rcconfig-file FILENAME configuration filename, def.: /home/username/.dobackuprc
-n | --dry-run dry run: find the files and list the archives
-v | --verbose verbose
-p | --parse-only parse the args and check them only
-h | --help show this help (ignore any remaining arguments)
and one of the following actions:
--show-dirs show directories
--show-files show files saved under USEDIR
--restore-files FILES restore the list of files (quoted) in USEDIR
--clean-scratch remove archives downloaded to scratch
Ver. 0.99/7 (Apr 4 2023)
cp (1)
df gzip/gunzip (2)
find lz4 (2)
tar compress/uncompress (2)
split bzip2/bunzip (2)
xcp (+) lzma (2)
cat (3)
chown (3)
chgrp (3)
ln (3)
ls (3)
touch (3)
- (1) used only when backing up to local disk,
- (2) used for the compression/uncompression with
tar
, - (3) used only by
doRestore
, - (+) used for scanning NetApp NFS mounted filesys (see NetApp XCP).
Cwd
Time::Local
File::Path
Net::Domain
aws
az-cli
rclone
namely the AWS or Azure CLI, or the more generic rclone
interface.
They need to be properly configured and have access to the respective cloud storage resource.
- Assuming that
$verb
is set to either''
or--verbose
$nthread
to a number greater or equal to 0$list
is a list of directories, using--limit-to "$list"
is optional$tag
an optional string identifying the vault$vault
the vault full name
- Backup of /data
./doBackup $verb \
--base-dir /data \
--n-threads $nthreads
- Partial backup
./doBackup $verb \
--base-dir /data \
--n-threads $nthreads --limit-to "$list"
- Adding to an existing vault
./doBackup $verb \
--base-dir /data \
--use-vault $vault \
--n-threads $nthreads \
--limit-to "$list"
- Backup in two steps, specifying the vault's tag
./doBackup $verb \
--base-dir /data \
--n-threads $nthreads \
--dry-run --tag $tag \
--limit-to "$list"
followed by
./doBackup $verb \
--base-dir /data \
--n-threads $nthreads \
--use-dry-run $tag \
--limit-to "$list"
- Check content of what was backed up by listing the top level directories
./doRestore --show-dirs
- Check content of what was backed up by listing the files for a given directory
./doRestore --use-dir home/username/junk --show-files
- Check what needs to be downloaded to restore of what was backed for given directory and sets of files
./doRestore --dry-run --use-dir home/username --restore-files 'junk/*'
- Restore of what was backed for a given top directory and a set of files
./doRestore --use-dir home/username --restore-files 'junk/*'
-
Need to be invoked with a 'path' for PERL to find the 'includes' (hence ./).
-
Note that the default value of
--base-dir
(and a lot more) can be set in the configuration file. -
Vault names are lower case and limited in length to conform to AWS and AZURE rules.
-
The sizes of 'archives' should match limits imposed by AWS, AZURE or the
rclone
cloud used.