Skip to content

Data-Detectives-101/Data-Detectives-Data-Science-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alzheimer’s and Dementia Modelling

🧠 Alzheimer’s and Dementia Disease Prediction Model (Random Forest + CNN Hybrid)

This proejct forms part of a larger reserach initiative titled " Applying Data Science for Early Diagnosis of Alzheimer's and Dementia." The focus of this session focuses on creating a machine learning models for dementia and alzheimers prediction.This project aims to develop predictive models for early detection of Alzheimer’s and Dementia using a combination of clinical data (CSV-based patient features) and MRI image data (from Kaggle).

Before being merged into hybrid architectures, several models were first put into practice and evaluated separately. For better diagnostic performance, these designs combine deep learning (Convolutional Neural Network) and machine learning (Random Forest).

The models use data-driven methods to examine trends and clinical signs in order to help with dementia and alzheimers risk assessment and early detection. The ultimate objective is to include these models into an interactive dashboard so that medical practitioners can monitor patients, show predictions so that they can advice on direct preventive therapy.

📘 Purpose of the Project

The goal of this research is to create a decision supporting models based on machine learning that will help medical professionals identify dementia and alzheimers disease early. The larger study, "Applying Data Science for Early Diagnosis of Alzheimer's and Dementia," includes it.

Better long-term health outcomes, better patient management, and prompt intervention are all made possible by early diagnosis. The Alzheimer's and Dementia datasets are the specific subjects of this notebook, which builds a trustworthy prediction models that calculates a person's chance of getting dementia or alzheimer's according to their risk levels.

⚙️ How It Works

In a single workflow, the Alzheimer's and Dementia Disease Prediction Models integrate visual inspection, machine learning classification, and data preprocessing.

Step-by-Step Process:

1. Data Input:
  • The user uploads a CSV file containing Dementia-related patient data.
2. Data Preprocessing:
  • Missing values are imputed using SimpleImputer.

  • Categorical variables are encoded using OneHotEncoder.

  • Numerical features are standardized with StandardScaler.

  • The dataset is balanced using SMOTE (Synthetic Minority Oversampling Technique).

3. Models Training:
  • The models first uses a Random Forest Classifier, tuned with RandomizedSearchCV for optimal performance.
  • Libraries: scikit-learn, numpy, pandas, matplotlib
  • Training/Testing Split: 80% training, 20% testing
  • Scaling: StandardScaler applied to numerical features
  • Evaluation metrics such as Accuracy, Balanced Accuracy, ROC-AUC, and Classification Report are calculated.
  • Optimization goal: Maximize accuracy and recall to minimize false negatives (critical in medical diagnosis).
4. Visualization:
  • Confusion Matrix and ROC Curve visualize performance.

  • Feature Importance chart highlights key indicators contributing to Dementia risk.

5. Predictions Output:
  • For each patient, the model predicts:

    • Dementia Risk Level (Low / Medium / High)

    • Model Reliability (Confidence Percentage)

🎯 Objectives

  • Developing a data-drivne prediction model for detecting Alzheimers and Dementia at early stages.
  • Evaluate Models performances and relability to ensure trustworthy predictions.
  • Visualization of key patterns, correlations, and diagnostic insights.
  • Laying the foundation for a hybrid model combining Random Forest (RF) with Convolutional Neural Networks (CNN) for enhanced accuracy and understanding.
  • Phase 1 was creating individual models for : Alzheimer's Random Forest model , Dementia Random Forest model and MRI scans-based CNN model.
  • Phase 2 was creating the hybrid models which contained the following: Alzheimer's Hybrid Model = RF + CNN ,Dementia Hybrid Model = RF + CNN
  • Enable seamless integration into a interactive dashboard for medical staff.

🧾 Modelling Structure

Phase 1 : Individual Models

  1. Alzheimer’s Random Forest Model

    • Predicts Alzheimer’s risk using patient clinical features (e.g., age, cognitive score, health metrics).

    • Trained on structured CSV data.

  2. Dementia Random Forest Model

    • Similar approach as above but trained on dementia-labeled dataset.
  3. MRI CNN Model

    • Convolutional Neural Network trained on MRI images to identify early neurological patterns.

    • Acts as the visual diagnostic component.

Phase 2 : Hybrid Models

  1. Alzheimer’s Hybrid Model (RF + CNN)

    • Combines Random Forest’s clinical predictions with CNN’s MRI-based probabilities.

    • Produces a joint risk prediction percentage and confidence score.

  2. Dementia Hybrid Model (RF + CNN)

    • Uses the same integration strategy for dementia detection.

🤖 Understanding the Hybrid Models

To increase accuracy and robustness, a hybrid model incorporates various machine learning and deep learning techniques.

Why Hybrid Approach?

  • MRI scans reveal changes in the structure of the brain that cannot be captured by clinical data alone.
  • Non-visual risk factors like age, blood pressure, family history, cognitive scores, and lifestyle may go unnoticed by MRI image analysis alone.

In this proejct:

  • Complex features from MRI brain scan pictures are extracted using Convolutional Neural Networks (CNNs).
  • Age, memory test results, medical history, and other structured tabular patient data are used in Random Forest (RF) models.

How they work together:

  • From medical photos, CNN pulls useful numerical features.
  • Traditional patient data is merged with such traits.
  • Both data sources are used by the Random Forest classifier to get the final prediction.

By combining these two:

  • The CNN extracts meaningful features from MRI scans.
  • The Random Forest uses those extracted features, together with clinical data, to make a final prediction about dementia and alzheimer's risk.

Compared to employing either model alone, this combination improves diagnostic precision and increases system reliability. Combining the two gives the model access to both biological and cognitive risk indicators, which results in a more comprehensive and precise diagnosis.

💻 What a User Can Do With It:

A user (e.g., researcher, healthcare data analyst, or clinician) can:

  • Upload patient datasets to predict Dementia risk levels.

  • Visualize feature importance and understand which factors contribute most to Dementia risk.

  • Evaluate model reliability for confidence in predictions.

  • Integrate the trained model into their own hospital dashboard or diagnostic platform.

  • Use the model for early screening and monitoring of patients over time.

🧩 5. How to Integrate It With Your System

To integrate the model with your system or dashboard:

  • Export the Model Output:

    • The trained Random Forest model and label encoders are saved as .pkl or .csv files.
  • Load the Model in Your Application

    • Use Python to load the model
  • Use in a Dashboard like this proejct (Example: Shiny or Streamlit)

    • Create a simple dashboard that allows users to upload a CSV file.

    • The backend reads and preprocesses the data.

    • The trained model predicts Dementia risk in real-time.

    • The dashboard displays:

      • Risk level (Low / Moderate / High)

      • Model reliability percentage

      • Feature importance visualization

📂 Expected Input Data

CSV files with structured tabular data are what the model anticipates. A single patient record with both numerical and categorical attributes should be represented by each row.

Expected Columns:

While the exact columns may vary, the model was trained on data similar to:

  • Patient_ID: Unique identifier for each patient
  • Name: Patient name
  • Surname: Patient surname
  • Age: Patient age
  • Gender: Male/Female "(M/F)"
  • Memory_Test_Score: Cognitive assessment score
  • Blood_Pressure: Measured in mmHg
  • Heart_Rate: Average resting heart rate
  • Education_Level Education category or years of schooling
  • MRI_Results MRI-based feature indicators (if available)
  • Family_History: Whether dementia runs in the family (Yes/No)
  • Target: Diagnosis outcome (Dementia / No Dementia)

Note: The CNN model should treat image files (such as MRI scans) independently and not upload them as part of this CSV if your dataset contains them.

📊 Evaluation Strategy:

Each dataset was divided into:

  • Training Set (80%)

  • Testing Set (20%)

  • K-fold cross-validation (k=5) was applied for generalization assessment.

  • Performance was visualized using:

    • Confusion Matrix
    • ROC Curves
    • Precision-Recall Curves

🧩 Model Development Process:

1. Data Preprocessing

  • Missing values handled using SimpleImputer
  • Categorical variables encoded with OneHotEncoder
  • Numerical features standardized with StandardScaler
  • Dataset balanced using SMOTE (Synthetic Minority Oversampling Technique)

2. Model Building

  • Implemented a Random Forest Classifier wrapped in a Pipeline

  • Performed hyperparameter tuning using RandomizedSearchCV

  • Computed evaluation metrics including:

    • Accuracy

    • Balanced accuracy

    • ROC-AUC score

    • Classification report

3. Visualization

  • Confusion Matrix and ROC Curve generated for model interpretation
  • Feature importance visualizations created to identify key predictors

4. Future Hybrid Models

  • A CNN (Convolutional Neural Network) will be integrated with the Random Forest model to create a hybrid prediction framework.
  • The CNN will extract high-level features from brain scan images, which will then feed into the Random Forest classifier for final prediction.

📊 Models Performance

  • The Random Forest models performed well with structured data but lacked image-level sensitivity.
  • The CNN model effectively captured spatial MRI patterns but underperformed on small datasets alone.
  • The hybrid models successfully merged the strengths of both — improving diagnostic reliability and achieving over 90% model reliability.

Prediction should be expressed as follow: “Patient X has a 78% probability of early-stage Alzheimer’s. Model Reliability: 91% (Low Risk)”

📚 References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors