Developed by Pylance in the Streamlit
Car Price Analysis Dashboard is an interactive data exploration and hypothesis testing tool built using Streamlit. The app allows users to explore relationships between car specifications and prices, validate hypotheses, and visualise insights through interactive charts. It was developed as part of the Code Institute Hackathon – Dashboard Essentials, using Python and data analytics techniques to produce actionable insights.
The dataset used in this project is sourced from the Car Price Prediction Dataset on Kaggle.
It contains detailed information about various car models, including:
- Specifications: horsepower, engine size, wheelbase, weight, and dimensions
- Categorical attributes: fuel type, drive wheel, body style, manufacturer, and engine type
- Target variable: price (in USD)
After cleaning and feature engineering, the dataset included several derived variables such as:
price_per_hp– price per horsepowerpower_to_weight_ratio– horsepower divided by curb weightengine_efficiency– horsepower divided by engine sizeavg_mpg– combined fuel efficiency based on city and highway mileage
The primary business goal is to identify which factors most strongly influence car prices and provide an interactive tool for exploring these relationships.
Specific business requirements include:
- Understand price drivers – Determine which car features (engine size, horsepower, fuel type, etc.) have the greatest impact on price.
- Compare fuel efficiency and cost – Analyse whether more fuel-efficient cars are cheaper or more expensive.
- Identify manufacturer trends – Compare average prices, engine efficiency, and performance metrics across manufacturers.
- Provide an interactive dashboard – Allow users to apply filters and drill down into subsets of data to explore patterns dynamically.
The following hypotheses were tested using statistical and visual analysis:
- Fuel type impacts car price.
Validation: Independent samples T-test and boxplots comparing average prices between fuel types (petrol vs diesel). - Fuel efficiency is inversely correlated with price.
Validation: Correlation analysis, scatter plots, and Mann–Whitney U test on high vs low efficiency groups. - Car body style influences car price.
Validation: ANOVA and boxplots comparing mean prices across body types (sedan, hatchback, convertible, etc.). - Front wheel drive cars are cheaper than rear wheel drive cars.
Validation: T-test and group comparisons using bar and box plots. - Top predictors of price can be identified through regression modelling.
Validation: Multiple linear regression model trained on key numerical predictors to assess variable importance.
- Data Collection – Load dataset from Kaggle (car_prices.csv).
- Data Cleaning – Handle missing values, remove duplicates, and standardise categorical values (e.g. convert “gas” → “petrol”).
- Exploratory Data Analysis (EDA) – Explore distributions, correlations, and outliers.
- Feature Engineering – Create derived variables for improved interpretability.
- Hypothesis Testing – Perform statistical tests and visual validation for each hypothesis.
- Model Building – Train and evaluate a regression model to identify top predictors of car price.
- Dashboard Development – Build an interactive Streamlit app with filters and visualisations.
- Deployment – Deploy final dashboard to Streamlit Cloud.
- Python (pandas, numpy) for data cleaning and transformation.
- Statistical testing (scipy) for hypothesis validation.
- Plotly and Seaborn for high-quality, interactive visualisation.
- scikit-learn for regression modelling and predictor ranking.
- Streamlit for interactivity and communication of results.
- The team collaborated primarily through Discord, using dedicated channels for daily discussions, progress updates, and file sharing. Regular virtual meetings were held to coordinate analysis, dashboard design, and documentation tasks.
- Version control and collaboration were managed through GitHub, where team members worked on feature branches, created pull requests, and resolved merge conflicts collaboratively.
- This approach ensured transparency, accountability, and efficient progress throughout the hackathon.
| Business Requirement | Visualisation Type | Rationale |
|---|---|---|
| Fuel type impacts car price | Boxplot & T-test | Compare price distributions by fuel type |
| Fuel efficiency vs price | Scatter plot & correlation | Show inverse trend between price and efficiency |
| Car body style influences price | Boxplot & ANOVA | Compare price averages across body styles |
| Drive type comparison | Grouped bar chart & T-test | Show whether FWD cars are cheaper than RWD |
| Identify top predictors | Feature importance plot | Show which variables most strongly predict price |
| Interactive dashboard | Streamlit filters & plots | Allow user-driven data exploration |
- Descriptive statistics – means, medians, and spreads for numeric variables.
- Correlation analysis – Pearson and Spearman coefficients.
- Inferential tests – T-tests, Mann–Whitney U tests, and ANOVA for categorical group comparisons.
- Regression modelling – Multiple linear regression to identify predictors.
- Visual analytics – Heatmaps, scatter plots, boxplots, and bar charts.
- Dataset size limited the complexity of models used.
- Price data may not account for regional or inflationary effects.
- Some body styles and drive types had few examples, limiting test power.
- Helped draft hypotheses, optimise code, and streamline dashboard layout.
- Supported the creation of structured test scripts and markdown documentation.
- Data privacy: Dataset is anonymised and publicly available.
- Bias: Some manufacturers are overrepresented, possibly biasing results.
- Fairness: Non-parametric tests were used to handle unequal variances.
- Transparency: All cleaning and transformation steps are documented in the notebooks.
- Dataset summary and KPIs (e.g., average price, average MPG).
- Correlation heatmap of numeric features.
- Box Plot showing the distribution of prices by fuel type.
- Violin Plot illustrating the spread of price by fuel type.
- KDE Plot showing the density distribution of car prices by fuel type, highlighting where price values are most concentrated and how they differ between fuel types.
- T-Test and Mann–Whitney U Test conducted to assess whether price differences between fuel types are statistically significant.
- Pearson and Spearman correlation tests performed to examine the strength and direction of relationships between fuel type and price.
- Summary statistics (mean and median prices) used to contextualise the findings.
- Scatter Plot with Regression line showing the relationship between fuel efficiency (average MPG) and car price, illustrating the strength and direction of correlation.
- Heatmap visualising correlations between key numerical features, highlighting how fuel efficiency relates to price and other performance variables.
- Bubble Plot displaying the combined effect of fuel efficiency, engine size, and price, providing a multivariate view of how these factors interact.
- Pearson and Spearman correlation tests conducted to evaluate both linear and monotonic relationships between fuel efficiency and price.
- T-Test and Mann–Whitney U Test performed to determine whether price differences between cars with varying fuel efficiency levels are statistically significant.
- Descriptive statistics (mean MPG and average price) used to support and contextualise findings.
- This page examines how different car body types (such as sedan, hatchback, coupe, convertible, and SUV) influence car prices.
- The data is grouped by car body type, and summary statistics (mean, median, minimum, maximum) are calculated for each group.
- Bar Plot visualizes the average price per car body type, providing a clear comparison across categories.
- Box Plot illustrates the price distribution for each body type, showing variations and potential outliers.
- The analysis suggests that SUVs and coupes tend to have higher average prices, while hatchbacks and sedans are generally more affordable.
- This page analyzes whether front-wheel drive (FWD) cars are generally cheaper than rear-wheel drive (RWD) cars.
- The data is filtered and grouped based on the drivewheel variable, and summary statistics (mean, median, minimum, maximum) are calculated for each drive type.
- Bar Plot shows the average car price for each drive type (FWD, RWD, 4WD), providing a clear visual comparison between them.
- Box Plot illustrates the price distribution across drive wheel types, showing variability and potential outliers.
- T-Test is performed to statistically compare prices between FWD and RWD cars.
- Displays the calculated t-statistic and p-value results.
- Determines whether the price difference between the two drive types is statistically significant (p < 0.05).
- Descriptive statistics are presented to support and contextualize the findings.
- Findings indicate whether FWD cars tend to have lower average prices than RWD cars.
- This page investigates whether any car features show a statistically significant relationship with price. It explains the hypothesis, the statistical approach, and presents the findings clearly.
- Continuous variables are analysed with the Spearman rank correlation test, highlighting which numeric features move in tandem with price.
- Categorical variables are assessed using Mann-Whitney U and Kruskal-Wallis H tests to identify group-level price differences.
- Results tables summarise test statistics and p-values, while heatmaps and violin plots visualise how features relate to car price.
- The analysis concludes with a refined list of ten key predictors, selected after accounting for multicollinearity and class imbalance, to inform future modelling.
- This page presents a demonstration model that estimates car prices based on key design and performance features.
- It begins with a short explanation of the model — a multiple linear regression trained on selected predictors — and displays its performance metrics and a visual comparison of predicted versus actual prices.
- Users can experiment with the model by entering their own feature values using interactive sliders and dropdown menus (for example, adjusting horsepower or choosing a car body type).
- After setting inputs, clicking “Predict Price” instantly generates an estimated car price, allowing users to explore how different specifications might influence market value.
- Interactive visuals (Plotly) allow exploration by non-technical users.
- Technical insights are summarised with statistical output text and markdown explanations.
-
NaN Error When Filters Exclude a Group:
On both hypothesis pages, when user-applied filters remove all records for a category (e.g. one fuel type, drive wheel, or body style), the corresponding statistical test or group mean calculation can return NaN or raise a ValueError.This happens because functions such as ttest_ind() and groupby().mean() require non-empty sample groups to operate correctly. The planned fix is to add a validation step that checks whether both comparison groups contain data before running the test, and display a user-friendly message if one group is empty. This issue does not affect other dashboard functionality or visuals.
-
Page Headers
Some page headers are omitted or incorrectly named some have "Western Car Price System Analysis" others have "Car Price Analytics Dashboard" -
Team Name
The notebooks do not record the team name and still have "tbc". -
Page Layout
The layout of some pages need improvement.
- Repository Desynchronisation:
One team member’s local repository fell significantly behind the others, causing version conflicts and missing updates. This required coordinated effort to resynchronise branches, rebase changes, and ensure all code and data files were correctly aligned before deployment. The team used GitHub’s pull request history and commit comparison tools to identify discrepancies and restore consistency. Although it delayed some progress, it improved everyone’s understanding of version control best practices. - Filter-Related NaN Errors:
Filters excluding all records from one group on hypothesis pages caused NaN or empty sample errors during statistical tests. This will be resolved in future updates by validating group data before running tests. - Streamlit Session State Management:
Maintaining consistent filters across multiple pages introduced complexity. The team usedst.session_stateto store global filters, though further optimisation is planned. - Responsive Layout and Performance:
Some larger visualisations caused temporary lag or layout stretching on smaller screens. Future iterations will include layout tuning and caching improvements.
- Version Control Mastery:
Continue improving Git and GitHub collaboration practices — particularly resolving merge conflicts, using branching workflows, and managing pull requests effectively. - Advanced Streamlit Techniques:
Learn more about session state optimisation, dynamic page navigation, and responsive dashboard design. - Machine Learning Modelling:
Build on the regression model by experimenting with tree-based or ensemble models (e.g., Random Forest, XGBoost) to improve prediction accuracy. - Performance Optimisation:
Explore caching strategies, modularisation, and profiling to make Streamlit apps faster and more scalable.
The app was deployed using Streamlit Cloud.
Live Link: https://car-analytics-codeinstitute.streamlit.app/
(Replace with your actual Streamlit deployment link)
- Push project repository to GitHub.
- Log in to Streamlit Cloud.
- Create a new app, select your repository and main branch.
- Set the entry point to dashboard_app.py.
- Include all dependencies in requirements.txt.
- Deploy the app — Streamlit builds and hosts automatically.
| Library | Purpose | Example Usage |
|---|---|---|
| pandas | Data manipulation | df.groupby('fueltype')['price'].mean() |
| numpy | Numeric operations | np.log(df['price']) |
| matplotlib | Static plots | plt.hist(df['price'], bins=20) |
| seaborn | Statistical plots | sns.boxplot(x='fueltype', y='price', data=df) |
| plotly.express | Interactive visuals | px.scatter(df, x='horsepower', y='price') |
| scipy.stats | Hypothesis testing | ttest_ind(group1, group2) |
| scikit-learn | Regression modelling | LinearRegression().fit(X_train, y_train) |
| streamlit | Dashboard interface | st.plotly_chart(fig) |
- Dataset: Car Price Prediction Dataset – Kaggle
- Code Institute Hackathon project structure
- Streamlit and Plotly documentation
- Stack Overflow for troubleshooting Streamlit state management
- Assistance and co-authoring support by ChatGPT (OpenAI GPT-5), ideation, code optimisation, hypothesis design, documentation checking, and Streamlit integration support
Special thanks to:
- Code Institute Hackathon mentors for their guidance and feedback
- Team members for collaboration, testing, and deployment
- OpenAI ChatGPT (GPT-5) for providing technical writing, code refinement, and analytical assistance
