Releases: fair-research/bdbag
PyPi release 1.7.3
Release Notes
Minor feature addition and bugfix
- Fix erroneous encoding of
%
char in URL field offetch.txt
which could break already properly encoded URLs.
This was due to a misinterpretation of the spec which states that%
(along withCR
andLF
) should only be URL
encoded for thefilename
field and that whitespace (\t
) should only be encoded in the URL field.- NOTE: As a best practice, applications should always pre-encode URLs that are added to
fetch.txt
and not rely onbdbag
to do so, since only whitespace will be encoded.
- NOTE: As a best practice, applications should always pre-encode URLs that are added to
- Added a new option
strict
to themake_bag
API function, along with a corresponding CLI argument. Ifstrict
is enabled,make_bag
will automatically validate a newly created or updated bag for structural validity and fail if the resultant bag is invalid. This can be used to ensure that a bag is not persisted without payload file manifests. Additionally, if a created output bag is not structurally valid, the bag will subsequently be reverted back to a normal directory. An updated bag will not be reverted. In either case, a BagValidationError exception will be thrown.
PyPi release 1.7.2
Release Notes
Minor feature addition and bugfixes
- Introducing support for bag idempotentcy, or reproducible bags. A reproducible bag is a bag that has content-equivalence (in both payload and metadata, including manifests) to another bag created a different time with the same content, structure, bagging tool, and profile (if used). When this bag creation and bag archive mode is enabled, two separately created bags (or bag archive files) with content-equivalence will hash equally, whether the hash is calculated on the bytes of the resultant archive file or calculated on the equivalently ordered set of individual file hashes of the bag's contents. See the API Guide for additional information.
- PR: #59 Only require the external package
importlib_metadata
for Python < 3.8. This module is already included asimportlib.metadata
in Python versions 3.8 and above. - Fix issue with HTTP fetch handler and auth header bearer-token stripping on redirects not getting restored to the cached
requests
session after redirect. - Remove dependency on deprecated
distutils
anddistutils.util.strtobool
function. - The
is_bag
API function will no longer attempt to instantiate aBag
object on non-directories.
PyPi release 1.7.1
Release Notes
Bugfix Release
Fix issue with packaging.parse
throwing InvalidVersion
in the upgrade_config()
function when trying to parse the
informational version string VERSION
set by bdbag
when it is running in a "frozen" (e.g., with cx_Freeze
) environment.
In such cases, VERSION
is set to something like 1.7.1-frozen
, which is not PEP-440
compliant.
This was not an issue in previous releases due to the fact that the implementation used pkg_resources.parse_version
which was not as strict.
The code in upgrade_config()
has been changed to parse the PEP-440
compliant version returned by distribution("bdbag").version
function from importlib_metadata
, rather than use the global string VERSION
, which can still be (and is) used elsewhere for purely informational and descriptive purposes.
Note that this bug only affects bdbag
when it is running in a frozen
environment. Otherwise, release 1.7.0
is equivalent in functionality.
PyPi release 1.7.0
Release Notes
- PR: #54: Add support for passing a local profile path for profile validation. Thanks to Bernhard Hampel-Waffenthal for the contribution.
- #40: Replace deprecated use of
pkg_resources
withimportlib-metadata
andpackaging
. - Fix issue with HTTP fetch transport where bearer-token auth gets stripped from the session on a legitimate redirect but not restored for any potential new request on that same URL-bound session.
- Unpin
tzlocal
unless Python<3. - Support for Python 3.5 and 3.6 has been dropped. Python 3.7 compatibility is deprecated but still officially supported in this release.
PyPi release 1.6.4
Release Notes
Added Google Cloud Storage fetch handler for handling gs://
URLs in fetch.txt.
Note that this is a soft dependency and you must install the gcloud CLI on the system where you will be running
bdbag
in order for this handler to function.
Enabling "requester pays":
This handler supports the requester pays usage pattern by allowing the billable project_id
to be specified in the auth_params
object for
a corresponding keychain.json
entry for a matching gs://
URI pattern.
For example, to configure (and allow) requester pays for a GS bucket, you would add a keychain.json
entry similar
to the following:
{
"uri": "gs://gcs-bdbag-integration-testing/",
"auth_type": "gcs-credentials",
"auth_params": {
"project_id": "bdbag-204999",
"allow_requester_pays": true
}
}
You can also explicitly disallow requester pays at the client-side in the following ways:
- Set
allow_requester_pays
tofalse
- Omit the
allow_requester_pays
field. - Omit the
project_id
field. - Omit the
auth_params
object entirely.
Note that if you do any of the above, data retrieval requests to buckets which have requester pays enabled will fail.
The use case for this configuration option is to ensure that you don't pay for requests when requester pays
is disabled on the bucket. Per the following GCS documentation:
Important: Buckets that have Requester Pays disabled still accept requests that include a billing project,
and charges are applied to the billing project supplied in the request.
Consider any billing implications prior to including a billing project in all of your requests.
IMPORTANT NOTE:
At the time of this writing, when using gcloud-CLI
from Google Cloud SDK 416.0.0
and previous, it is
possible to still be billed for bucket usage even if you've disallowed requester pays for a given bucket in
keychain.json
. This is because the gcloud init
process requires that you specify a default project_id
and this
project id is subsequently stored in the application_default_credentials.json
file used by the GCS APIs
(which the bdbag
fetch handler uses) as quota_project_id
. If this value is present it will be passed on all GCS API
calls as a fallback regardless even if explicitly not passed to the API call.
This can be worked around by removing the quota_project_id
from application_default_credentials.json
.
Using service account credentials:
It is also possible to specify a service_account_credentials_file
which is a file path referencing a service account
credentials JSON file provided by Google Cloud Storage. For example:
{
"uri": "gs://bdbag-dev/",
"auth_type": "gcs-credentials",
"auth_params": {
"project_id": "bdbag-204400",
"service_account_credentials_file": "/home/bdbag/bdbag-204400-41babdd46e24.json"
}
}
PyPi release 1.6.3
Release Notes
Bugfix release and dependency update.
- Fix bug in
bdbag_api.validate()
where underlyingBagError
exceptions were not being propagated correctly. - Add an environment marker to
setup.py
for thepython-requests
dependency. This marker specifies that no greater thanrequests-2.25.1
be used withPython3.5
environments, due to underlying incompatibilities withrequests
dependency chain andPython3.5
afterrequests-2.26.0
. Reported in issue #47.
Note that bdbag
support for Python3.5
is planned to be dropped in the 1.7.0
release.
PyPi release 1.6.2
Release Notes
- Set "User-Agent" header for HTTP fetch handler (via
python-requests
) to"bdbag/{version} (requests/{version})"
. - Added
sha1
support forbdbag_utils
functioncreate-rfm-from-url-list
. See PR #46. - Fix issues with unicode handling in
fetch.txt
, ROmetadata.json
,keychain.json
, andremote-file-manifest
JSON files. - Fix issues with over-escaping (urlencoding) of filenames and urls in
fetch.txt
and ROmetadata.json
.
Per the spec, only CR,LF, whitespace, and literal percent should be encoded.
PyPi release 1.6.1
Release Notes
BDBag release 1.6.0
Release Notes
Minor feature release with bugfixes and dependency updates.
- Implement #37: Support external fetch transports via plug-in architecture.
- Added
--output-path
CLI (and corresponding API) argument for specifying output path for extracted archives. - Added a
bypass_ssl_cert_verification
configuration option for thehttps
fetch handler so that SSL certificate verification could be disabled either globally (not recommended) or on a whitelisted set of URL paths used in simple substring matches against a bag'sfetch.txt
URLs. - Update the
--validate-profile
CLI argument so that it can take an optional keyword argument,bag-only
, which can be used to bypass the otherwise automatic profile serialization validation, and therefore is suitable to use on extracted bag directories. - Fixed issue with
archive_bag
API function not including empty directories when creatingzip
format archives. - Modified
extract_bag
API function to accurately include the bag root directory path of the extracted bag archive in the return value. Previously, this value could have wound up being different from the file archive base name; for example if the archive file was renamed or was created in such a way that the base file name never matched the archived bag directory root. - Refactored
bagit-profile
support. This module is no longer "vendored" internally and is now a proper external dependency intended to be pulled from PyPi. TheProfile
class is patched internally, as needed. This dependency is currently pinned to1.3.1
. - Updated
bdbag-profile.json
andbdbag-ro-profile.json
to leverage newer features ofbagit-profile
version1.3
. Loosened "Manifests-Required" to only requiremd5
for both profiles. - Pinned
bagit-python
dependency version to1.8.1
. - Added Python 3.8 and 3.9 support to
setup.py
metadata and travis builds. - Dropped Python 3.4 support.
BDBag release 1.5.6
Release Notes
Bugfix release with minor feature addition.
- Fix #34: New file hashes for existing manifest entries generated from remote-file-manifests don't get updated in bags.
- Fix #36: Directory paths with slash at the end during "archive_bag" results in a malformed archive name.
- Added
update_keychain
API function inauth/keychain.py
for programmatic add/update/delete of keychain entries. - Added Python 3.7 support to
setup.py
metadata and Travis builds.