This project analyzes factors influencing domestic opening weekend box office revenues for movies. It employs various statistical techniques, including linear regression and Principal Component Analysis (PCA), to identify key determinants of box office success.
The dataset includes information about movies such as:
- Box office revenue
- Production budget
- Star power
- Movie genre (e.g., action, horror, animated)
- MPAA rating
- Sequel status
- "Buzz" variables (addict, cmngsoon, fandango, cntwait3)
-
Data Preprocessing
- Log transformation of skewed variables
- Standardization of variables for PCA
-
Linear Regression Analysis
- Models with traditional variables
- Models incorporating "buzz" variables
-
Principal Component Analysis (PCA)
- Applied to "buzz" variables
- Applied to "buzz" variables and other continuous variables
-
Model Comparison
- Evaluation using R-squared and Adjusted R-squared
- Significance testing of variables
- "Buzz" variables significantly improve model performance
- PCA helps in dimensionality reduction while maintaining model performance
- Surprising non-significance of some traditional variables (e.g., star power)
- Importance of movie genre (action, animated) and MPAA rating (PG) in predicting box office success
- Python
- Pandas (for data manipulation)
- Scikit-learn (for PCA and regression analysis)
- Matplotlib or Seaborn (for visualizations)
- Buzz generation is crucial for opening weekend success
- High budgets and star power don't guarantee success
- Action and animated movies with PG ratings tend to perform well
- Effective marketing strategies can significantly impact box office performance
- Incorporate additional variables (e.g., release date, competition)
- Explore non-linear relationships and interaction effects
- Conduct time series analysis to understand trends over time