Github1401-18_25

This project is a Retrival System defined based on Github Readmes. All the data are crawled from Github api. Additionally, the classification and clustring parts use codes from most starred repos.

How to run the project

First of all, for installing requirements run the command below.

pip install -r requirements.txt

After that, you have to download fasttext model from this link and unzip the downloaded file. Afterward, put the three unzipped files under the "Third" folder inside the MIR directory. Afterward you have to download transformer model from this link and unzip the downloaded file. Put the unzipped file under the "Fourth" folder inside the MIR directory. Finally, run the Django project. Change the directory by

cd MIR

Run the Django Server:

python manage.py runserver

The app can be reached by searching http://127.0.0.1:8000/search-engine in the browser.

Structure of the Project

There have been 3 assignments that are merged in this project. The folder "Third", "Fourth", "Fifth", "project_elastic" inside MIR each contain the codes and notebooks for each assignments. The Report for each assignment is inside the notebook. In the third assignment, four different Retrival methods are implemented. All of them contain Query Expansion(QE). In the forth assginment, clustering and classification are implemented. In the fifth assignment, link prediction is implemented. This assignment did not have any specific result, so it is not used in the UI. In the final project, elasticsearch is implemented

Elasticsearch

Elasticsearch is a NoSQL, distributed, full-text database. Because of NoSQL, it doesn't require any structured data and does not use any standard structured query language for searching.

For setting up Elasticsearch locally You just have to download it and run the executable according to your system. Make sure that you have Java installed on your machine Once you set up the local environment, you can verify whether it’s working by hitting http://localhost:9200 in your browser or via cURL. It should give you a JSON response like this:

{
  "name" : "1b415f1159de",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "xl2IFgFrSNOWCgov26hF5Q",
  "version" : {
    "number" : "7.9.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "d34da0ea4a966c4e49417f2da2f244e3e97b4e6e",
    "build_date" : "2020-09-23T00:45:33.626720Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

And for pythonizing it Elasticsearch provides REST APIs to manage data, but to use ES with Python efficiently, there is an official library called elasticsearch.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
MIR		MIR
MIRcopy		MIRcopy
UI		UI
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
bert_embedding.npy		bert_embedding.npy
funcs.py		funcs.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Github1401-18_25

How to run the project

Structure of the Project

Elasticsearch

About

Releases

Packages

Contributors 5

Languages

IR1401-Spring-Final-Projects/Github1401-18_25

Folders and files

Latest commit

History

Repository files navigation

Github1401-18_25

How to run the project

Structure of the Project

Elasticsearch

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages