Large datasets

How do  we deal with datasets that are too large to be read into an `Array`? Something of 5-50Gb, for example. Are there any tools for it or earlier discussion? 

I thought about: 

 * Iterators instead of arrays. Pros: simple. Cons: some tools (e.g. from MLDataUtils) may require random access to elements of a dataset.
 * New array type with lazy data loading. Maybe memory-mapped array, maybe something more custom. Pros: exposes `AbstractArray` interface, so existing tools will work. Cons: some tools and algorithms may expect data to be in memory while for disk-based arrays their performance will drop drastically. 
 * Completely custom interface. PyTorch's datasets/dataloaders may be a good example. Pros: flexible, easy to provide fast access. Cons: most functions from MLDataUtils will break. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large datasets #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Large datasets #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions