pip install websnap
Click here to view a websnap overview diagram.
from websnap import websnap
# Execute websnap using default arguments
websnap()
# Execute websnap passing arguments
websnap(file_logs=True, s3_uploader=True, backup_s3_count=7, early_exit=True)
To access CLI documentation in terminal execute:
websnap_cli --help
Click to unfold function parameters / CLI options
Parameter | Type | Default |
---|---|---|
config |
str |
"config.ini" |
log_level |
str |
"INFO" |
file_logs |
bool |
False |
s3_uploader |
bool |
False |
backup_s3_count |
int | None |
None |
timeout |
int |
32 |
early_exit |
bool |
False |
repeat_minutes |
int | None |
None |
section_config |
str | None |
None |
Option | Shortcut | Default |
---|---|---|
--config |
-c |
config.ini |
--log_level |
-l |
INFO |
--file_logs |
-f |
False |
--s3_uploader |
-s |
False |
--backup_s3_count |
-b |
None |
--timeout |
-t |
32 |
--early_exit |
-e |
False |
--repeat_minutes |
-r |
None |
--section_config |
-n |
None |
Function parameter / CLI option |
Description |
---|---|
config (str) |
|
log_level (str) |
|
file_logs (bool) |
|
s3_uploader (bool) |
|
backup_s3_count (int | None) |
|
timeout (int) |
|
early_exit (bool) |
|
repeat_minutes (int | None) |
|
section_config (str | None) |
|
Click to unfold S3 bucket usage
Utilizes the AWS SDK for Python (Boto3) to add and backup API files as objects in an S3 bucket.
# The s3_uploader argument must be passed as True to copy files as objects to an S3 bucket
# Copies objects to an S3 bucket using default argument values
websnap(s3_uploader=True)
# Copies objects to an S3 bucket, repeats every 1440 minutes (24 hours),
# and at maximum 4 backup objects are allowed for each config section
websnap(s3_uploader=True, repeat_minutes=1440, backup_s3_count=4)
-
The following CLI option must be used to enable websnap to upload files as objects in an S3 bucket:
--s3_uploader
-
Copies objects to an S3 bucket using default argument values:
websnap_cli --s3_uploader
-
Copies objects to an S3 bucket, repeats every 1440 minutes (24 hours), and at maximum 4 backup objects are allowed for each config section:
websnap_cli --s3_uploader --repeat_minutes 1440 --backup_s3_count 4
- The following environment variables are required:
ENDPOINT_URL
,AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
- A valid
.ini
or.json
configuration file is required. - Websnap expects the config to be
config.ini
in the same directory as websnap package is being executed from.- However, this can be changed using the
config
function argument (or CLI--config
option).
- However, this can be changed using the
- All keys in tables below are mandatory.
Format | Example Configuration File |
---|---|
.ini |
src/websnap/config_templates/s3_config_template.ini |
.json |
src/websnap/config_templates/s3_config_template.json |
Supports setting environment variables in a .env
file.
Example .env
file:
ENDPOINT_URL=https://dreamycloud.com
AWS_ACCESS_KEY_ID=1234567abcdefg
AWS_SECRET_ACCESS_KEY=hijklmn1234567
Environment Variable | Description |
---|---|
ENDPOINT_URL |
URL to use for the constructed S3 client |
AWS_ACCESS_KEY_ID |
AWS access key ID |
AWS_SECRET_ACCESS_KEY |
AWS secret access key |
- Each file retrieved from an API requires its own config section!
- The section name be anything, it is suggested to have a name that relates to the copied file.
Example S3 config section configuration with key prefix:
[resource]
url=https://www.example.com/api/resource
bucket=exampledata
key=subdirectory_resource/resource.xml
Example S3 config section configuration without key prefix:
[project]
url=https://www.example.com/api/project
bucket=exampledata
key=project.json
Key | Value Description |
---|---|
url |
API URL endpoint that file will be retrieved from |
bucket |
Bucket that file (as an object) will be written in |
key |
Object key name with extension, can optionally include prefix |
Click to unfold local machine usage
# Write files retrieved from an API to local machine using default argument values
websnap()
# Write files retrieved from an API locally and repeats every 60 minutes (1 hour),
# file logs are enabled
websnap(file_logs=True, repeat_minutes=60)
-
Write copied files to local machine using default argument values:
websnap_cli
-
Write copied files locally and repeats every 60 minutes (1 hour), file logs are enabled:
websnap_cli --file_logs --repeat_minutes 60
- A valid
.ini
or.json
configuration file is required for both function and CLI usage. - Websnap expects the config to be
config.ini
in the same directory as websnap package is being executed from.- However, this can be changed using the
config
function argument (or CLI--config
option).
- However, this can be changed using the
- Each file that will be retrieved from an API requires its own section.
- If the optional
directory
key/value pair is omitted then the file will be written in the directory that the program is executed from.
Format | Example Configuration File |
---|---|
.ini |
src/websnap/config_templates/config_template.ini |
.json |
src/websnap/config_templates/config_template.json |
Example local machine configuration section:
[project]
url=https://www.example.com/api/project
file_name=project.json
directory=projectdata
Key | Value Description |
---|---|
url |
API URL endpoint that file will be retrieved from |
file_name |
File name with extension |
directory (optional) |
Local directory name that file will be written in |
Click to unfold logs
Websnap supports optional rotating file logs.
- The following CLI option must be used to enable websnap to support rotating file logs:
--file_logs
- In function usage the following argument must be passed to support rotating file
logs:
file_logs=True
- In function usage the following argument must be passed to support rotating file
logs:
- If log keys are not specified in the configuration
[DEFAULT]
section then default values in the table below will be used. log_when
expects a value used by logging module TimedRotatingFileHandler.- Click here for more information about how to use TimedRotatingFileHandler.
- The default values result in the file logs being rotated once every day and no removal of backup log files.
Example log configuration:
[DEFAULT]
log_when=midnight
log_interval=1
log_backup_count=7
Key | Default | Value Description |
---|---|---|
log_when |
D |
Specifies type of interval |
log_interval |
1 |
Duration of interval (must be positive integer) |
log_backup_count |
0 |
If nonzero then at most <log_backup_count > files will be kept,oldest log file is deleted (must be non-negative integer) |
Click to unfold minimum download size
Websnap supports optionally specifying the minimum download size (in kilobytes) a file must be to copy it from the configured API URL endpoint.
- By default the minimum default minimum size is 0 kb.
- Unless specified in the configuration this means that a file of any size can be downloaded by websnap.
- Configured minimum download size must be a non-negative integer.
- If the content from the API URL endpoint is less than the configured size:
- An error will be logged and the program continues to the next config section.
- If the CLI option
--early_exit
(or function argumentearly_exit=True
) is enabled then the program will terminate early.
Example minimum download size configuration:
[DEFAULT]
min_size_kb=1
Key | Default | Value Description |
---|---|---|
min_size_kb |
0 |
Minimum download size in kilobytes (must be non-negative integer) |
Rebecca Kurup Buchholz
This project was developed to facilitate EnviDat resiliency and support continuous operation during server maintenance.
EnviDat is the environmental data portal of the Swiss Federal Institute for Forest, Snow and Landscape Research WSL.