required python modules:
- matplotlib
- for GUI components.
- instructions:
pip3 install matplotlib
- osrm-py
- for utilising OSRM for map routing.
- instructions:
pip3 install osrm-py
- SRTM.py
- for utilising SRTM for altitude data.
- instructions:
pip3 install SRTM.py
- numpy
- for use in the machine learning components.
- instructions:
pip3 install numpy
also needed:
- tensorflow
- A machine learning library, for use in the machine learning components.
- instructions:
- the main guide for installing tensorflow can be found here.
- a tensorflow instance must be running when running any code from the
ml
module.
- OSRM
- A routing engine, used by the
walk_dist
ranker and any others that use every heuristic (this includes all the machine learning rankers). - instructions:
- main installation instructions depend on platform, and can be found here.
- once installed, instructions for running the backend can be found here. However, they are in essence as follows:
- download a .osm.pbf file corresponding to the area you need from geofabrik.
- in the folder you have placed your .osm.pbf file, run
osrm-extract your-file.osm.pbf -p path/to/osrm-backend/profiles/car.lua osrm-contract your-file.osrm
- you will now have a .osrm file, ready to use.
- to run OSRM, once set up:
- in the folder with your .osrm file:
osrm-routed your-file.osrm
- this will start a HTTP server from which the API can be accessed.
- by default, the server is at
localhost:5000
and this is what the user-model code expects.
- A routing engine, used by the
To load a story from its JSON representation:
myStory = importer.story_from_JSON('My Story File')
The function returns a Story
object for use elsewhere.
User logs can also be loaded in from their JSON representation.
To simply load every path in the JSON file for a particular story:
paths_per_reading = importer.path_pages_from_JSON('My Log File', myStory)
To try and filter out demo-mode readings from the logs:
paths_per_reading = importer.filtered_paths_from_JSON('My Log File', myStory)
This will look for developer user IDs, see if the sequence of paths taken is possible within the constraints of the story and see how fast the user would have been going to determine if a reading may have been in demo-mode. These readings are filtered out.
These functions will return a dictionary of the format:
paths_per_reading = {
'Reading1': [ Page1, Page2, Page3, ..., PageN ]
...
'Reading2': [ Page1, Page2, Page3, ..., PageN ]
}
Where each reading is a Reading
object, and each page a Page
object.
It is recommended (but not required) to make a Cache
, which stores the heuristics calculated for each page. This prevents the need to recalculate heuristics when doing many readings. To make a Cache
:
myCache = cache.Cache()
To simulate a reading of the story:
sim_store = traverser.traverse(myStory, rankerFun, deciderFun, cache=myCache)
rankerFun
is a function that will take a bunch of pages and produce a score of how 'appealing' each page is. There are several ranker functions in the ranker
module, for instance rand
(random), walk_dist
(walking distance) or logreg
(logistic regression). Note: before using any machine learning-based rankers (linreg
, logreg
or nn
), their models must be initialised. This is discussed here.
deciderFun
is a function that takes the output of rankerFun and uses it to choose the next page. There are two decider functions provided in the decider
module: rand
(which uses a random number to choose, where pages with a higher score are more likely to be chosen) and best
, which simply chooses the page with the highest score.
The Cache is optional, but recommended.
There are a few other, optional parameters. You can choose the maximum number of steps before the simulation gives up with max_steps
(default 50). You can print the progress of the simulation with prnt
.
The function returns a list of Record
s. These contain a Page
(where the user was at this point) and a dictionary of 'options' (the scores of the pages they could go to next).
To simulate n readings:
stores = traverser.traverse(myStory, n=50, cache=myCache)
This will perform 50 readings, for example.
You can also choose your own ranker and decider here, but they both default to rand
. This is to assist in trying to find 'unreachable' pages.
To generate a reading from user logs:
log_store = traverser.traverse_log(myStory, paths_per_reading)
This function will do a simulated reading. However, at each point, the program will go where most people went next from the current page (out of all visible pages). The generated reading can therefore be used as a 'representative' reading of how the users acted in the logs.
The allow_quitting
option may be set to True
to enable the traversal to quit the story early, if this was the most popular action. It defaults to False
.
The returned store's Record
s have an extra 'page' in their options
: 0
. This represents the amount of people that quit the story from this page.
To show a GUI with info on the story, logged readings and simulations:
gui.show_all(myStory, paths_per_reading, stores, sim_store, log_store, rankerFun, cache)
All of these arguments except the story are optional. If some arguments are left out, some GUI elements will not be shown.
To show a GUI displaying the three key metrics:
gui.show_main_three(myStory, paths_per_reading, stores, sim_store, log_store, rankerFun, cache)
There are many functions to export different forms of information to a .csv
format:
exporter.paths_per_reading_to_csv(paths_per_reading, 'my export file')
for apaths_per_reading
dictionary.exporter.paths_to_csv(stores, 'my export file')
for a bunch of stores as output bytraverse
ortraverse_log
.exporter.stores_to_CSV(stores, myStory, 'my export file')
to also retain info onoptions
from each page.exporter.cache_to_CSV(myCache, 'my export file')
for acache
; this can be re-imported later, to further avoid having to calculate heuristics.exporter.regression_to_CSV(myRegressionParams, 'my export file')
for regression parameters (linear or logistic), as output by their respective machine learning functions.exporter.nn_to_CSV(myNN, 'my export file')
for a neural network as output byml.nn
.
There are also some functions that output to JSON, where this format makes more sense than csv:
cache_to_JSON(myCache, 'my export file')
for a cache. This format can also be re-imported later.stores_to_JSON(stores, 'my export file')
for a list of stores as output bytraverse
ortraverse_log
.
Consult the wiki.