This repository was archived by the owner on Jul 22, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 18
Using CountingGridsPy Library
Jay Windsor edited this page Jun 24, 2019
·
2 revisions
CountingGridsPy is the machine learning library that supports BrowseCloud. CountingGridsPy ships with Python scripts that transform raw text data into the files that the BrowseCloud application consumes. At a high level, these scripts tokenize, clean the tokens, vectorize the tokens, run the learning algorithms, and produce model files.
Clone the code from here to get started:
git clone https://github.com/microsoft/browsecloud.git
Requires:
- Python 3.6.7+ (You can get it from www.python.org)
- Your shell is in the EngineToBrowseCloudPipeline directory.
- Dependencies installed as shown below in the dependencies section.
- engine_type is "numpyEngine".
- inputfile_type is "simpleInput" or "metadataInput".
| inputfile_type | Description | Example |
|---|---|---|
| "simpleInput" | path_to_output_folder contains a TXT file called inputfile_name with the following schema: {title, content (i.e. abstract), and link}. It does not have a header row. If you do not have a title, then leave it blank, followed by a tab. If you do not have a link, leave it as blank, after a tab. | https://github.com/microsoft/browsecloud/blob/master/CountingGridsPy/ExampleInputs/MuellerReport.txt |
| "metadataInput" | path_to_output_folder contains a CSV file called inputfile_name with the following schema: {alias, title, abstract, responseId, surveyId, link, image} with a header row. This type allows the user to correlate topics with metadata beyond sentiment analysis. |
- inputfile_name must reference a csv or a txt file, as shown in the table.
- PYTHONPATH environment variable setup to search for the CountingGridsPy module.
- extent_size_of_grid_hyperparameter and window_size_of_grid_hyperparameter are integers such that the former is greater than the latter. The quotient of the square of extent_size_of_grid_hyperparameter and the square of window_size_of_grid_hyperparameter is the number of topics the model can carry (i.e the model's capacity).
API:
python .\dumpCountingGrids.py path_to_output_folder extent_size_of_grid_hyperparameter window_size_of_grid_hyperparameter engine_type inputfile_type inputfile_name
Example:
python dumpCountingGrids.py data_folder 24 5 numpyEngine metadataInput channeldump.csv