MedusaStorage

This gem exists to hide some of the interaction with underlying storage mechanisms (e.g. a filesystem or S3) so that client code can operate on the content in a uniform manner.

The basic concept is a 'root'. This specifies a location from which content can be addressed by a 'key'. For example, the root might be a directory on a filesystem and the key the relative path from that directory. Or it might be a bucket and (possibly empty) prefix in S3, and the medusa_storage key specifies the rest of the
Amazon key via bucket:prefix/key.

Once we have the root/key pair, we can do various things like find out if it really exists in storage, get information about it, or get an IO or true file from it. We will continue to build these capabilities as we need them.

Additionally, we can have a set of roots so that an application can talk to different storages or different points in a storage by setting up appropriate roots.

Note that we organize our S3 content using a directory-ish structure, i.e. with slashes in the keys indicating how we want to think of things as directories and files.

Usage

Install the gem in the usual way in the Gemfile. You'll need to use the git: argument to point to the github repository.

Typically the gem will be used by instantiating and storing a MedusaStorage::RootSet. To configure, pass an array where each entry is a hash with symbol keys to configure a single root. For all roots the hash will have a name: key that should be unique in the RootSet. The root can be extracted from the RootSet using the 'at' method with the name. It also has a type: key that expresses what type of storage the root is.

A 'filesystem' type root has a path: key that gives the path on the filesystem that is used as the base of that root's storage. Key arguments to the methods then reflect a relative path starting there.

An 's3' type root has keys bucket:, aws_access_key_id:, aws_secret_access_key, and region: that take the appropriate Amazon S3 configuration information. There is also an optional prefix: key that allows us to specify a prefix inside the bucket that is prepended to all key arguments to the various methods. By default this is blank. If the aws_access_key_id and aws_secret_access_key are not both provided then they are not used when instantiating a client, and the client's permissions are controlled by the IAM role instead. This is useful if you want to set the bucket access permissions directly on an EC2 instance instead of by using the keys, for example.

There are a set of methods common to the various types of roots as well as some specific to a particular one. For example, you can use 'size' with any root, but 'presigned_get_url' only really makes sense with an S3 root.

I'm not going to document all of these at this time - check the code. I will be adding more as the need for them arise. Some of them work with a particular object, e.g. to get its size. Others do things like get all of the file keys or directory keys 'in' a directory, or the entire set of keys 'under' a directory (these are useful to use, for example, in generating downloader manifests). Others deliver the content at a key as either an IO or a string path to a file, to be yielded to a block (the former is useful for generating fixity checksums, the latter for FITS, which doesn't operate nicely on just a stream). See the individual method documentation.

Temporary directories

Notably, the with_input_file for S3 needs to copy the file from S3 onto a filesystem. It may be that where you want to do that is not the system tmpdir, or even depends on the size.

You can set MedusaStorage::Config to set a global value. You can also set the tmp_dir_picker of a root to a MedusaStorage::TmpDirPicker for more control. (This can be any object that responds to #pick(size) by returning a temp dir path; the provided class does this from a simple spec provided to #new.)

By default a root uses its picker if present, then the MedusaStorage::Config value if present, then Dir.tmpdir. Also the with_input_file method allows you to pass in the tmp_dir you want to use if you like, which takes precedence over any of those.

##TODO

Just jotting down some things that need attention that I'm skipping over to get the broad outlines in place.

Metadata beyond mtime and md5_sum. Send as headers to S3 or set on filesystem as appropriate and possible. I'm not sure if there is anything else we're really keeping track of here.
Possibly a metadata updater for S3. This is done by copying the object over itself with new metadata and setting the metadata_directive to 'REPLACE'. See the S3 docs. Note that this may only work for objects < 5GB (above that it looks like you can still make a copy, but have to use the multi-part uploader)

Limitations

JRuby is currently (9.2.4.0) incompatible with the S3 copy_io_to_large method. So it won't correctly do files over 5GB. Note that the underlying SDK upload_stream method is documented to have difficulties with JRuby. It will not always fail, but it does unpredictably and seemingly frequently.

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
bin		bin
lib		lib
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
Rakefile		Rakefile
medusa_storage.gemspec		medusa_storage.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MedusaStorage

Usage

Temporary directories

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

medusa-project/medusa_storage

Folders and files

Latest commit

History

Repository files navigation

MedusaStorage

Usage

Temporary directories

Limitations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages