Github1401-18_25

This project is a Retrival System defined based on Github Readmes. All the data are crawled from Github api. Additionally, the classification and clustring parts use codes from most starred repos.

How to run the project

First of all, for installing requirements run the command below.

pip install -r requirements.txt

After that, you have to download fasttext model from this link and unzip the downloaded file. Afterward, put the three unzipped files under the "Third" folder inside the MIR directory. Afterward you have to download transformer model from this link and unzip the downloaded file. Put the unzipped file under the "Fourth" folder inside the MIR directory. Finally, run the Django project. Change the directory by

cd MIR

Run the Django Server:

python manage.py runserver

The app can be reached by searching http://127.0.0.1:8000/search-engine in the browser.

Structure of the Project

There have been 3 assignments that are merged in this project. The folder "Third", "Fourth", "Fifth", "project_elastic" inside MIR each contain the codes and notebooks for each assignments. The Report for each assignment is inside the notebook. In the third assignment, four different Retrival methods are implemented. All of them contain Query Expansion(QE). In the forth assginment, clustering and classification are implemented. In the fifth assignment, link prediction is implemented. This assignment did not have any specific result, so it is not used in the UI. In the final project, elasticsearch is implemented

Elasticsearch

Elasticsearch is a NoSQL, distributed, full-text database. Because of NoSQL, it doesn't require any structured data and does not use any standard structured query language for searching.

For setting up Elasticsearch locally You just have to download it and run the executable according to your system. Make sure that you have Java installed on your machine Once you set up the local environment, you can verify whether it’s working by hitting http://localhost:9200 in your browser or via cURL. It should give you a JSON response like this:

{
  "name" : "1b415f1159de",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "xl2IFgFrSNOWCgov26hF5Q",
  "version" : {
    "number" : "7.9.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "d34da0ea4a966c4e49417f2da2f244e3e97b4e6e",
    "build_date" : "2020-09-23T00:45:33.626720Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

And for pythonizing it Elasticsearch provides REST APIs to manage data, but to use ES with Python efficiently, there is an official library called elasticsearch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Github1401-18_25

How to run the project

Structure of the Project

Elasticsearch

Files

README.md

Latest commit

History

README.md

File metadata and controls

Github1401-18_25

How to run the project

Structure of the Project

Elasticsearch