Information Retrieval (IR) system based for a collection of documents (Twitter messages)
Install anaconda and activate with conda activate
.
Install the requirements
pip install -r requirements.txt
Fetch the Pre-trained Vector Model
sh init.sh
python src/preprocessing.py
python src/inverted_index.py
usage: main.py [-h] [-Q QUERY] [-F] [-M {default,BERT}] [-QE {level-1,level-2,none}]
optional arguments:
-h, --help show this help message and exit
-Q QUERY, --query QUERY
string to query
-F, --file Run queries defined in ../data/topics_MB1-49.txt
-M {default,BERT}, --method {default,BERT}
Neural Method for Retrieval
-QE {level-1,level-2,none}, --queryexpansion {level-1,level-2,none}
Level of query expansion
Note:default
corresponds to tf-idf weighted cosine similarity retrieval method
To run specified query in command and output results in console:
python src/main.py -M {BERT|default} -Q "<query>"
To run default batch defined in data/topics_MB1-49.txt
and produce Results.txt
:
python src/main.py -M {BERT|default} -F
map all 0.2075
P_10 all 0.2408
map all 0.0274
P_10 all 0.0327
map all 0.0356
P_10 all 0.0388
map all 0.2070
P_10 all 0.2408
map all 0.2070
P_10 all 0.2408