-
Notifications
You must be signed in to change notification settings - Fork 0
Ranker
It contains a number of functions, each of which takes a list of visible pages (as well as information on the user and story), and outputs a dictionary of scores on how 'desirable' a page is. This is then passed on to a decider function to make a choice on which page to go to.
The scores given to pages must add up to 1, meaning they can represent the probabilities of choosing each page (if given to dc.rand
).
These functions are made to be given to traverser.traverse
, though you could define your own (though it would need to have the same parameters).
When performing a simulated reading (or many) with traverser.traverse
, simply set the ranker
argument to your ranker of choice from this module.
The machine learning based rankers (linreg
, logreg
& nn
) require an extra step: first, you must call the corresponding function in the ml
module to train the model.
There is a manual_heuristics
variable in the ranker
module, filled with manually tuned heuristic weights for use by the linear regression ranker linreg
. However, you can change these values or create your own. The format is:
manual_heuristics = {
'w': [
1.0, # weight of walk dist heuristic
2.0, # weight of visits heuristic
4.4, # weight of altitude heuristic
3.5, # weight of points of interest heuristic
7.0, # weight of mention heuristic
5.2, # weight of walk dist ranking
5.6, # weight of visits ranking
1.1, # weight of altitude ranking
8.2, # weight of points of interest ranking
3.6, # weight of mention ranking
],
'b': 0.3 # bias
}
For a linear regression. For a logistic regression, turn each value into a length 2 array, with the first element being the weight of the heuristic towards whether to not choose the page, and the second element being the weight of the heuristic towards whether to choose the page.
First: call normalise_inputs(paths_per_reading, cache=None, exclude_poi=False)
.
Then, set your manual_heuristics
to be equal to...
-
ranker.linreg_model
for a linear regression. -
ranker.logreg_model
for a logistic regression. -
ranker.net_model
for a neural network.
Then just use your chosen ranker as normal.
rand(user, story, pages, cache=None)
Gives an equal score to every page.
dist(user, story, pages, cache=None)
Gives a higher score to closer pages to the user, in a straight line.
walk_dist(user, story, pages, cache=None)
Gives a higher score to closer pages to the user, via roads using the OSRM routing engine.
Note: In order to use this ranker, an OSRM HTTP server must be running at localhost:5000
. The osrm-py
python module must also be installed.
visits(user, story, pages, cache=None)
Gives a higher score to pages that have been visited less already in the current reading.
alt(user, story, pages, cache=None)
Gives a higher score to pages with a lower altitude. Note: In order to use this ranker, the 'SRTM.py' module must be installed.
poi(user, story, pages, cache=None)
Gives a higher score to pages surrounded by more points of interest.
mentioned(user, story, pages, cache=None)
Gives a higher score to pages with titles mentioned more by the title & text of the current page.
logreg(user, story, pages, cache=None)
Uses a logistic regression model to give scores to pages, based on almost every heuristic and ranker.
Note: In order to use this ranker, you must first run ml.logreg
to train a logistic regression.
linreg(user, story, pages, cache=None)
Uses a linear regression model to give scores to pages, based on almost every heuristic and ranker.
Note: In order to use this ranker, you must first run ml.linreg
to train a linear regression.
nn(user, story, pages, cache=None)
Uses a neural network model to give scores to pages, based on almost every heuristic and ranker.
Note: In order to use this ranker, you must first run ml.nn
to train a neural network.
normalise_inputs(paths_per_reading, cache=None, exclude_poi=False)
Sets up input normalisation for when using manual_heuristics
with a regression ranker. If you want to use manual_heuristics
in your ranking function, this function must be called first.
rank_by(heuristic, inverse=False, no_loop=False)
Outputs a function that ranks pages according to the output of heuristic
.
If you define a new heuristic function for pages, this can be an easy way to make a ranker out of it.
The heuristic
function must have the parameters page
, user
, story
and cache
.
If inverse
is True
, pages with a higher output from heuristic
will be given a better ranking (such as with poi
; more points of interest is better). If it is False
, pages with a lower output from heuristic
will be given a better ranking (such as with dist
; less distance to the user is better).
Setting no loop
to True
applies an extra filter to the ranking function, which eliminates pages already visited.
_net(x, w, b)
Is the actual neural net used by the nn
ranker.
x
is the array of inputs, as output by ml.make_input
.
w
is the weight array of the network (signifying the weight of each neuron).
b
is the bias array of the network.