Skip to content

SambhawDrag/CorpusLoaders.jl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CorpusLoaders

Docs Latest

Travis Status AppVeyor Status

Coverage Status

codecov.io

A collection of various means for loading various different corpora used in NLP.

Common Structure

For some corpus which we will say has type Corpus, it will have a constructior Corpus(path) where path argument is a path to the files describing it. That path will default to a predefined data dependency, if not provided. The data dependency will be downloaded the first time you call Corpus(). When the datadep resolves it will give full bibliograpghic details on the corpus etc. For more on that like configuration details, see DataDeps.jl.

Each corpus has a function load(::Corpus). This will return some iterator of data. It is often lazy, e.g. using a Channel, as many corpora are too large to fit in memory comfortably. It will often be an iterator of iterators of iterators ... Designed to be manipulated by using MultiResolutionIterators.jl. The corpus type is an indexer for using named levels with MultiResolutionInterators.jl. so lvls(Corpus, :para) works.

Corpora

Follow the links below for full docs on the usage of the corpora.

About

A variety of loaders for various NLP corpora.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Julia 100.0%