Open
Description
The method is too slow!
Do we really need dask.dataframe
? Maybe better to store documents on disk as single files (and not as one big .csv)?
References:
- How one tried to fix the problem locally: TopicBank-Experiment-BankCreation.ipynb, section Lower Time Consumption in Case of Big Datasets