Skip to content
This repository was archived by the owner on Jul 22, 2024. It is now read-only.

Using CountingGridsPy Library

Jay Windsor edited this page Jun 24, 2019 · 2 revisions

CountingGridsPy

CountingGridsPy is the machine learning library that supports BrowseCloud. CountingGridsPy ships with Python scripts that transform raw text data into the files that the BrowseCloud application consumes. At a high level, these scripts tokenize, clean the tokens, vectorize the tokens, run the learning algorithms, and produce model files.

Clone the code from here to get started:

 git clone https://github.com/microsoft/browsecloud.git

Input

Requires:

  • Python 3.6.7+ (You can get it from www.python.org)
  • Your shell is in the EngineToBrowseCloudPipeline directory.
  • Dependencies installed as shown below in the dependencies section.
  • engine_type is "numpyEngine".
  • inputfile_type is "simpleInput" or "metadataInput".
inputfile_type Description Example
"simpleInput" path_to_output_folder contains a TXT file called inputfile_name with the following schema: {title, content (i.e. abstract), and link}. It does not have a header row. If you do not have a title, then leave it blank, followed by a tab. If you do not have a link, leave it as blank, after a tab. https://github.com/microsoft/browsecloud/blob/master/CountingGridsPy/ExampleInputs/MuellerReport.txt
"metadataInput" path_to_output_folder contains a CSV file called inputfile_name with the following schema: {alias, title, abstract, responseId, surveyId, link, image} with a header row. This type allows the user to correlate topics with metadata beyond sentiment analysis.
  • inputfile_name must reference a csv or a txt file, as shown in the table.
  • PYTHONPATH environment variable setup to search for the CountingGridsPy module.
  • extent_size_of_grid_hyperparameter and window_size_of_grid_hyperparameter are integers such that the former is greater than the latter. The quotient of the square of extent_size_of_grid_hyperparameter and the square of window_size_of_grid_hyperparameter is the number of topics the model can carry (i.e the model's capacity).



API:

 python .\dumpCountingGrids.py path_to_output_folder extent_size_of_grid_hyperparameter window_size_of_grid_hyperparameter engine_type inputfile_type inputfile_name

Example:

 python dumpCountingGrids.py data_folder 24 5 numpyEngine metadataInput channeldump.csv
Clone this wiki locally