A simple yet powerful movie recommendation web application built with Python and Streamlit. This app allows users to get personalized movie suggestions based on a collaborative filtering model.
- Search Functionality: Look up any movie present in the MovieLens 100k dataset with fuzzy search.
- Similarity-Based Recommendations: Get 5 movie recommendations based on the cosine similarity of user ratings.
- Top Rated Homepage: The homepage displays the Top 10 highest-rated movies in the dataset (minimum 50 ratings).
- Rich UI: Fetches and displays movie posters from the OMDb API for a visually appealing experience.
- Interactive Interface: A clean and simple web interface powered by Streamlit.
- Backend: Python
- Web Framework: Streamlit
- Data Manipulation: Pandas, NumPy
- Machine Learning: Scikit-learn (for Cosine Similarity)
- APIs: OMDb API (for posters), Kaggle API (for dataset)
- Searching: rapidfuzz
The recommendation engine is built on the principle of collaborative filtering.
- Data Processing: The MovieLens 100k dataset is loaded and transformed into a user-movie rating matrix, where each row is a movie and each column is a user.
- Similarity Calculation: The cosine similarity is calculated between all movies in the matrix. This score represents how similar two movies are based on the ratings they received from the same users.
- Recommendation: When a user selects a movie, the system retrieves the top 5 movies with the highest similarity scores to the selected movie, excluding the movie itself.
In simple terms, cosine similarity is a way to measure how similar two items are by looking at their orientation or direction, not their size or magnitude.
Imagine you represent each movie as an arrow (a vector) in a giant, multi-dimensional space. The direction of this arrow is determined by the ratings from all the users.
-
Each Movie is a Vector: Think of a movie like "Toy Story" as a long list of numbers, where each number is a rating from a specific user:
Toy Story Vector = [user1_rating, user2_rating, user3_rating, ...]-
Toy Story Vector = [4, 5, 0, 3, 5, ...](where 0 means the user didn't rate it)
-
Comparing Directions: Cosine similarity calculates the angle (
$\theta$ ) between the vectors of two movies.-
High Similarity (Value is close to 1): If "Toy Story" and "Finding Nemo" are both rated highly by the same group of people, their rating patterns are very similar. Their vectors will point in almost the exact same direction. The angle between them is very small, so their cosine similarity is close to 1.
-
Low Similarity (Value is close to 0): If "Toy Story" and a horror movie like "Saw" are rated by completely different groups of people, their rating patterns are unrelated. Their vectors will point in very different directions, almost at a 90-degree angle to each other. Their cosine similarity will be close to 0.
-
Cosine similarity is powerful because it ignores overall popularity. A niche indie film and a blockbuster could have very different numbers of ratings (different vector "lengths"), but if the same type of people liked both of them, they will be considered highly similar.
In short, it helps you find movies with a similar "taste profile".
Follow these instructions to get a copy of the project up and running on your local machine.
-
Clone the repository:
git clone [https://github.com/NotRemit/Movie-Recommeder.git](https://github.com/NotRemit/Movie-Recommeder.git) cd Movie-Recommeder -
Install the required libraries: (It's recommended to create a virtual environment first)
pip install -r requirements.txt
-
Download the dataset:
Download the dataset from Kaggle:
import kagglehub # Download latest version path = kagglehub.dataset_download("shubhammehta21/movie-lens-small-latest-dataset") print("Path to dataset files:", path)
-
Set up your API key:
Replace the placeholder in the code with your OMDb API key
API_KEY = "YOUR_API_KEY_HERE"
-
Run the Streamlit app:
streamlit run movieRec.py
The application should now be running in your web browser!

