Skip to content

Commit

Permalink
Merge pull request #4 from sethgoldin/windows
Browse files Browse the repository at this point in the history
Windows
  • Loading branch information
sethgoldin authored Mar 10, 2020
2 parents 3337a23 + e9181b1 commit 486700f
Show file tree
Hide file tree
Showing 4 changed files with 149 additions and 11 deletions.
26 changes: 16 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,38 @@
# s3-xxHash-transfer
Included in this repository are a couple of shell scripts that glue together the [MHL tool](https://github.com/pomfort/mhl-tool) and the [AWS CLI](https://docs.aws.amazon.com/cli/index.html). This allows for a workflow that can transfer enormous amounts of data through an [S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html) with extremely fast checksum verification. These scripts ensure bulletproof data integrity, verifying every single bit, with blazingly fast speed afforded by 64-bit xxHash and AWS S3.
Included in this repository are shell scripts that glue together the [MHL tool](https://github.com/pomfort/mhl-tool) and the [AWS CLI](https://docs.aws.amazon.com/cli/index.html). This allows for a workflow that can transfer enormous amounts of data through an [S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html) with extremely fast checksum verification. These scripts ensure bulletproof data integrity, verifying every single bit, with blazingly fast speed afforded by 64-bit xxHash and AWS S3.

## Workflow

1. On the "source" computer, run `$ sh send.sh`. `sudo` privilege is _not_ required. `send.sh` will prompt for the source destination directory, "seal" the contents of the directory with 64-bit xxHash checksums, prompt for the name of a new S3 bucket, automatically make that bucket, and then ingest the entire directory into the bucket.
2. On the "destination" computer, run `$ sh receive.sh`. `sudo` privilege is _not_ required. `receive.sh` will prompt for the name of the S3 bucket that had been created by `send.sh`, prompt for the local directory path into where the data will be downloaded, and then will automatically download all data from the S3 bucket and verify the 64-bit xxHash checksums for every single file.
1. On the "source" computer:
1. Depending on your OS:
1. On macOS or Linux, run `$ sh send.sh`. `sudo` privilege is _not_ required.
1. On Windows, run `PS> send.ps1`.
1. The "send" script will prompt for the source destination directory, "seal" the contents of the directory with 64-bit xxHash checksums, prompt for the name of a new S3 bucket, automatically make that bucket, and then ingest the entire directory into the bucket.
1. On the "destination" computer:
1. Depending on your OS:
1. On macOS or Linux, run `$ sh receive.sh`. `sudo` privilege is _not_ required.
1. On Windows, run `PS> receive.ps1`.
1. The "receive" script will prompt for the name of the S3 bucket that had been created by the "send" script, prompt for the local directory path into where the data will be downloaded, and then will automatically download all data from the S3 bucket and verify the 64-bit xxHash checksums for every single file.

The MHL file generated on the sending side and verified on the receiving side functions as as a kind of manifest for the data, which ensures end-to-end data integrity. These scripts use the extremely fast [64-bit xxHash hashing algorithm](https://github.com/Cyan4973/xxHash).

## System requirements
- The [MHL tool](https://github.com/pomfort/mhl-tool) should be installed into your `$PATH`. On CentOS 7.7 and Fedora 31, after compiling from source so that `mhl` will call the properly installed versions of the OpenSSL libraries, it is [recommended](https://unix.stackexchange.com/questions/8656/usr-bin-vs-usr-local-bin-on-linux/8658#8658) to manually move the `mhl` binary into `/usr/local/bin`, since the program will not be managed by the distribution's package manager.
- The [MHL tool](https://github.com/pomfort/mhl-tool) should be installed into your `$PATH`. On CentOS 7.7 and Fedora 31, after compiling from source so that `mhl` will call the properly installed versions of the OpenSSL libraries, it is [recommended](https://unix.stackexchange.com/questions/8656/usr-bin-vs-usr-local-bin-on-linux/8658#8658) that you manually move the `mhl` binary into `/usr/local/bin`, since the program will not be managed by the distribution's package manager.
- The [`.pkg` installer from Pomfort](http://download.pomfort.com/mhl-tool.zip) will install a precompiled binary for macOS into `/usr/local/bin`, which is included by default in macOS's `$PATH`.
- On Windows, download and extract [the precompiled binary from Pomfort](http://download.pomfort.com/mhl-tool.zip), and then copy or move `mhl.exe` into `C:\Windows\System32\`, which is included by default in the Windows `Path` system environment variables.
- The [AWS CLI](https://aws.amazon.com/cli/) should be installed and configured on both endpoints, with:
- The sending IAM user having at least full S3 write access on the AWS account
- The receiving IAM user having at least full S3 read access on the AWS account
- Both endpoints connected to the same [region](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions)
- The command output format set to [text](https://docs.aws.amazon.com/cli/latest/userguide/cli-usage-output.html#text-output)

## Compatible platforms
Release 0.0.2 has tested on Linux and macOS endpoints, specifically on:
## Tested platforms
Release 0.0.3 has tested on Linux, macOS, and Windows, specifically on:
- Fedora 31
- CentOS 7.7
- macOS Catalina 10.15.3
- Windows 10 1909

There aren't too many dependencies, so these scripts seem like they should work flawlessly on other major Linux distributions as well, though no other distributions have been tested.

Both the MHL tool and the AWS CLI are available across Linux, macOS, and Windows, so the same `bash` scripts work identically on Linux and macOS.

Though `zsh` is now the default shell on macOS Catalina, the script runs in `bash`, as specified from the first line of the script: `#!/bin/bash`. For now, Catalina still ships `bash`. Whether future releases of macOS will contain `bash` is an open question. The scripts may need to be modified in the future to run natively in `zsh`, but at least for now, on Catalina, `bash` works.

The [Windows command shells](https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/windows-commands), both the Command shell and PowerShell, are quite a bit different than `bash`, so porting to Windows will take a bit more effort, and will come in a future release.
62 changes: 62 additions & 0 deletions receive.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Lets check to make sure that mhl is properly installed
if (Get-Command mhl -errorAction Stop)
{}

# Lets check to make sure that aws is properly installed
if (Get-Command aws -errorAction Stop)
{}

# Let's have the user specify from which bucket they'll be downloading

$s3BucketName = Read-Host "What is the name of the AWS S3 bucket from which you'll be downloading the data? The name of a bucket takes the form <s3://bucket-name> with only lowercase letters and hyphens, but uses NO uppercase letters nor spaces"

# Let's have the user specify exactly into which directory on the local system they want the data to go

Function Get-Folder($initialDirectory)

{
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms")|Out-Null

$foldername = New-Object System.Windows.Forms.FolderBrowserDialog
$foldername.Description = "Select a folder"
$foldername.rootfolder = "MyComputer"

if($foldername.ShowDialog() -eq "OK")
{
$folder += $foldername.SelectedPath
}
return $folder
}

echo "Into which local directory on your system are you downloading the data?"

$destinationLocalDirectory = Get-Folder

# Now $destinationLocalDirectory will work as the variable for the destination folder on the local system into which the data will go

# Let's now sync from the S3 bucket to the local system https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

aws s3 sync $s3BucketName $destinationLocalDirectory;

# Let's check to make sure that a .mhl file exists in the destination.

cd $destinationLocalDirectory

# If there are no MHL files in this directory, we'll throw an error.

if (((Get-ChildItem -Path $destinationLocalDirectory -filter *.mhl | Measure-Object | Select-Object -expandProperty Count) -eq 0))
{echo "ERROR: The local directory does not seem to have an MHL file with which to verify the contents. The data integrity of the contents of this directory cannot be verified."; Exit-PSSession}

# If there are more than one MHL files in the directory, we'll throw an error, because we don't know which MHL file to check.

elseif (((Get-ChildItem -Path $destinationLocalDirectory -filter *.mhl | Measure-Object | Select-Object -expandProperty Count) -gt 1))
{ echo "ERROR: There are more than one MHL files in the directory, so this script does not know which MHL to use to verify the contents of the directory. The data integrity of the contents of this directory cannot be verified." ; Exit-PSSession}

# If there's exactly one MHL file, let's grab the name of it and store that into a variable, and then verify the MHL file we found. Once the download has finished and the MHL file has been verified, let's let the user know that the data has been downloaded and verified.

else
{ $mhlFileName = gci *.mhl; mhl verify -f $mhlFileName;
if ($LASTEXITCODE -ne 0)
{ Exit-PSSession }
else
{ echo "The data from the AWS S3 bucket named <$s3BucketName> has been downloaded into $destinationLocalDirectory and has been verified." }}
70 changes: 70 additions & 0 deletions send.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Lets check to make sure that mhl is properly installed
if (Get-Command mhl -errorAction Stop)
{}

# Lets check to make sure that aws is properly installed
if (Get-Command aws -errorAction Stop)
{}

# Let's have the user specify which source folder should be uploaded into S3

Function Get-Folder($initialDirectory)

{
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms")|Out-Null

$foldername = New-Object System.Windows.Forms.FolderBrowserDialog
$foldername.Description = "Select a folder"
$foldername.rootfolder = "MyComputer"

if($foldername.ShowDialog() -eq "OK")
{
$folder += $foldername.SelectedPath
}
return $folder
}

echo "From which directory on your local system are you uploading into AWS S3?"

$sourceLocalDirectory = Get-Folder

# Now $sourceLocalDirectory will work as the variable for the source folder from the local system that will be ingested into S3

# Let's prompt the user for the name of the S3 bucket

$s3BucketName = Read-Host "What should the name of the AWS S3 bucket be? The name of the bucket should take the form <s3://bucket-name> with only lowercase letters and hyphens, but should use NO uppercase letters nor spaces"

# Now $s3BucketName will work as the variable for the name of the S3 bucket

# Let's make the S3 bucket first, because this is probably the most likely source of an error. According to Amazon, "An Amazon S3 bucket name is globally unique, and the namespace is shared by all AWS accounts. This means that after a bucket is created, the name of that bucket cannot be used by another AWS account in any AWS Region until the bucket is deleted. You should not depend on specific bucket naming conventions for availability or security verification purposes."

aws s3 mb $s3BucketName;

if ($LASTEXITCODE -ne 0)
{ Exit-PSSession }

# Let's cd into the source directory, and then execute mhl seal for the whole directory, with the xxHash algorithm, which is nice and fast
# N.B. mhl must be run from inside the source directory, so best practice is to cd in to the directory right within the shell script itself: https://stackoverflow.com/a/10566581/

cd $sourceLocalDirectory;

if ($LASTEXITCODE -ne 0)
{ Exit-PSSession }

mhl seal -t xxhash64 *;

if ($LASTEXITCODE -ne 0)
{ Exit-PSSession }

# We're using the 64-bit xxHash algorithm specifically, because it's fast and reliable https://github.com/Cyan4973/xxHash

# Now that we've sealed the contents of the folder, let's sync the data from the local folder into the bucket https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html# Now that we've sealed the contents of the folder, let's sync the data from the local folder into the bucket https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html# Now that we've sealed the contents of the folder, let's sync the data from the local folder into the bucket https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

aws s3 sync "$sourceLocalDirectory" $s3BucketName;

if ($LASTEXITCODE -ne 0)
{ Exit-PSSession }

# Once the upload has finished, let's let the user know that the data has been sealed and ingested.

echo "The data from <$sourceLocalDirectory> has been sealed with xxHash checksums and has been ingested into the AWS S3 bucket named <$s3BucketName>."
2 changes: 1 addition & 1 deletion send.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ cd "$sourceLocalDirectory"

mhl seal -t xxhash64 * &&

# We're using the 64-bit xxHash algorith specifically, because it's fast and reliable https://github.com/Cyan4973/xxHash
# We're using the 64-bit xxHash algorithm specifically, because it's fast and reliable https://github.com/Cyan4973/xxHash

# Now that we've sealed the contents of the folder, let's sync the data from the local folder into the bucket https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Expand Down

0 comments on commit 486700f

Please sign in to comment.