`keep_in_memory=False` leads to the fact that `dataset.get_vw_document()` is almost unworkable

The method is too slow!

Do we really need `dask.dataframe`? Maybe better to store documents on disk as single files (and not as one big .csv)?

References:
* How one tried to fix the problem locally: [TopicBank-Experiment-BankCreation.ipynb](https://github.com/machine-intelligence-laboratory/OptimalNumberOfTopics/blob/feature/pscience-experiment-pipelines/demos/TopicBank-Experiment-BankCreation.ipynb), section *Lower Time Consumption in Case of Big Datasets*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`keep_in_memory=False` leads to the fact that `dataset.get_vw_document()` is almost unworkable #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

keep_in_memory=False leads to the fact that dataset.get_vw_document() is almost unworkable #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`keep_in_memory=False` leads to the fact that `dataset.get_vw_document()` is almost unworkable #59