Alzheimer’s and Dementia Modelling

🧠 Alzheimer’s and Dementia Disease Prediction Model (Random Forest + CNN Hybrid)

This proejct forms part of a larger reserach initiative titled " Applying Data Science for Early Diagnosis of Alzheimer's and Dementia." The focus of this session focuses on creating a machine learning models for dementia and alzheimers prediction.This project aims to develop predictive models for early detection of Alzheimer’s and Dementia using a combination of clinical data (CSV-based patient features) and MRI image data (from Kaggle).

Before being merged into hybrid architectures, several models were first put into practice and evaluated separately. For better diagnostic performance, these designs combine deep learning (Convolutional Neural Network) and machine learning (Random Forest).

The models use data-driven methods to examine trends and clinical signs in order to help with dementia and alzheimers risk assessment and early detection. The ultimate objective is to include these models into an interactive dashboard so that medical practitioners can monitor patients, show predictions so that they can advice on direct preventive therapy.

📘 Purpose of the Project

The goal of this research is to create a decision supporting models based on machine learning that will help medical professionals identify dementia and alzheimers disease early. The larger study, "Applying Data Science for Early Diagnosis of Alzheimer's and Dementia," includes it.

Better long-term health outcomes, better patient management, and prompt intervention are all made possible by early diagnosis. The Alzheimer's and Dementia datasets are the specific subjects of this notebook, which builds a trustworthy prediction models that calculates a person's chance of getting dementia or alzheimer's according to their risk levels.

⚙️ How It Works

In a single workflow, the Alzheimer's and Dementia Disease Prediction Models integrate visual inspection, machine learning classification, and data preprocessing.

Step-by-Step Process:

1. Data Input:

The user uploads a CSV file containing Dementia-related patient data.

2. Data Preprocessing:

Missing values are imputed using SimpleImputer.
Categorical variables are encoded using OneHotEncoder.
Numerical features are standardized with StandardScaler.
The dataset is balanced using SMOTE (Synthetic Minority Oversampling Technique).

3. Models Training:

The models first uses a Random Forest Classifier, tuned with RandomizedSearchCV for optimal performance.
Libraries: scikit-learn, numpy, pandas, matplotlib
Training/Testing Split: 80% training, 20% testing
Scaling: StandardScaler applied to numerical features
Evaluation metrics such as Accuracy, Balanced Accuracy, ROC-AUC, and Classification Report are calculated.
Optimization goal: Maximize accuracy and recall to minimize false negatives (critical in medical diagnosis).

4. Visualization:

Confusion Matrix and ROC Curve visualize performance.
Feature Importance chart highlights key indicators contributing to Dementia risk.

5. Predictions Output:

For each patient, the model predicts:
- Dementia Risk Level (Low / Medium / High)
- Model Reliability (Confidence Percentage)

🎯 Objectives

Developing a data-drivne prediction model for detecting Alzheimers and Dementia at early stages.
Evaluate Models performances and relability to ensure trustworthy predictions.
Visualization of key patterns, correlations, and diagnostic insights.
Laying the foundation for a hybrid model combining Random Forest (RF) with Convolutional Neural Networks (CNN) for enhanced accuracy and understanding.
Phase 1 was creating individual models for : Alzheimer's Random Forest model , Dementia Random Forest model and MRI scans-based CNN model.
Phase 2 was creating the hybrid models which contained the following: Alzheimer's Hybrid Model = RF + CNN ,Dementia Hybrid Model = RF + CNN
Enable seamless integration into a interactive dashboard for medical staff.

🧾 Modelling Structure

Phase 1 : Individual Models

Alzheimer’s Random Forest Model
- Predicts Alzheimer’s risk using patient clinical features (e.g., age, cognitive score, health metrics).
- Trained on structured CSV data.
Dementia Random Forest Model
- Similar approach as above but trained on dementia-labeled dataset.
MRI CNN Model
- Convolutional Neural Network trained on MRI images to identify early neurological patterns.
- Acts as the visual diagnostic component.

Phase 2 : Hybrid Models

Alzheimer’s Hybrid Model (RF + CNN)
- Combines Random Forest’s clinical predictions with CNN’s MRI-based probabilities.
- Produces a joint risk prediction percentage and confidence score.
Dementia Hybrid Model (RF + CNN)
- Uses the same integration strategy for dementia detection.

🤖 Understanding the Hybrid Models

To increase accuracy and robustness, a hybrid model incorporates various machine learning and deep learning techniques.

Why Hybrid Approach?

MRI scans reveal changes in the structure of the brain that cannot be captured by clinical data alone.
Non-visual risk factors like age, blood pressure, family history, cognitive scores, and lifestyle may go unnoticed by MRI image analysis alone.

In this proejct:

Complex features from MRI brain scan pictures are extracted using Convolutional Neural Networks (CNNs).
Age, memory test results, medical history, and other structured tabular patient data are used in Random Forest (RF) models.

How they work together:

From medical photos, CNN pulls useful numerical features.
Traditional patient data is merged with such traits.
Both data sources are used by the Random Forest classifier to get the final prediction.

By combining these two:

The CNN extracts meaningful features from MRI scans.
The Random Forest uses those extracted features, together with clinical data, to make a final prediction about dementia and alzheimer's risk.

Compared to employing either model alone, this combination improves diagnostic precision and increases system reliability. Combining the two gives the model access to both biological and cognitive risk indicators, which results in a more comprehensive and precise diagnosis.

💻 What a User Can Do With It:

A user (e.g., researcher, healthcare data analyst, or clinician) can:

Upload patient datasets to predict Dementia risk levels.
Visualize feature importance and understand which factors contribute most to Dementia risk.
Evaluate model reliability for confidence in predictions.
Integrate the trained model into their own hospital dashboard or diagnostic platform.
Use the model for early screening and monitoring of patients over time.

🧩 5. How to Integrate It With Your System

To integrate the model with your system or dashboard:

Export the Model Output:
- The trained Random Forest model and label encoders are saved as .pkl or .csv files.
Load the Model in Your Application
- Use Python to load the model
Use in a Dashboard like this proejct (Example: Shiny or Streamlit)
- Create a simple dashboard that allows users to upload a CSV file.
- The backend reads and preprocesses the data.
- The trained model predicts Dementia risk in real-time.
- The dashboard displays:
  - Risk level (Low / Moderate / High)
  - Model reliability percentage
  - Feature importance visualization

📂 Expected Input Data

CSV files with structured tabular data are what the model anticipates. A single patient record with both numerical and categorical attributes should be represented by each row.

Expected Columns:

While the exact columns may vary, the model was trained on data similar to:

Patient_ID: Unique identifier for each patient
Name: Patient name
Surname: Patient surname
Age: Patient age
Gender: Male/Female "(M/F)"
Memory_Test_Score: Cognitive assessment score
Blood_Pressure: Measured in mmHg
Heart_Rate: Average resting heart rate
Education_Level Education category or years of schooling
MRI_Results MRI-based feature indicators (if available)
Family_History: Whether dementia runs in the family (Yes/No)
Target: Diagnosis outcome (Dementia / No Dementia)

Note: The CNN model should treat image files (such as MRI scans) independently and not upload them as part of this CSV if your dataset contains them.

📊 Evaluation Strategy:

Each dataset was divided into:

Training Set (80%)
Testing Set (20%)
K-fold cross-validation (k=5) was applied for generalization assessment.
Performance was visualized using:
- Confusion Matrix
- ROC Curves
- Precision-Recall Curves

🧩 Model Development Process:

1. Data Preprocessing

Missing values handled using SimpleImputer
Categorical variables encoded with OneHotEncoder
Numerical features standardized with StandardScaler
Dataset balanced using SMOTE (Synthetic Minority Oversampling Technique)

2. Model Building

Implemented a Random Forest Classifier wrapped in a Pipeline
Performed hyperparameter tuning using RandomizedSearchCV
Computed evaluation metrics including:
- Accuracy
- Balanced accuracy
- ROC-AUC score
- Classification report

3. Visualization

Confusion Matrix and ROC Curve generated for model interpretation
Feature importance visualizations created to identify key predictors

4. Future Hybrid Models

A CNN (Convolutional Neural Network) will be integrated with the Random Forest model to create a hybrid prediction framework.
The CNN will extract high-level features from brain scan images, which will then feed into the Random Forest classifier for final prediction.

📊 Models Performance

The Random Forest models performed well with structured data but lacked image-level sensitivity.
The CNN model effectively captured spatial MRI patterns but underperformed on small datasets alone.
The hybrid models successfully merged the strengths of both — improving diagnostic reliability and achieving over 90% model reliability.

Prediction should be expressed as follow: “Patient X has a 78% probability of early-stage Alzheimer’s. Model Reliability: 91% (Low Risk)”

📚 References

Kaggle Alzheimer’s MRI Dataset: https://www.kaggle.com/datasets/ninadaithal/imagesoasis/data
Scikit-learn Documentation: https://scikit-learn.org/stable/
TensorFlow Documentation: https://www.tensorflow.org/
Building a hybrid model : https://www.mdpi.com/2072-4292/15/3/728

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dashboard		dashboard
modelling		modelling
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alzheimer’s and Dementia Modelling

🧠 Alzheimer’s and Dementia Disease Prediction Model (Random Forest + CNN Hybrid)

📘 Purpose of the Project

⚙️ How It Works

Step-by-Step Process:

1. Data Input:

2. Data Preprocessing:

3. Models Training:

4. Visualization:

5. Predictions Output:

🎯 Objectives

🧾 Modelling Structure

🤖 Understanding the Hybrid Models

💻 What a User Can Do With It:

🧩 5. How to Integrate It With Your System

📂 Expected Input Data

Expected Columns:

📊 Evaluation Strategy:

🧩 Model Development Process:

1. Data Preprocessing

2. Model Building

3. Visualization

4. Future Hybrid Models

📊 Models Performance

📚 References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Alzheimer’s and Dementia Modelling

🧠 Alzheimer’s and Dementia Disease Prediction Model (Random Forest + CNN Hybrid)

📘 Purpose of the Project

⚙️ How It Works

Step-by-Step Process:

1. Data Input:

2. Data Preprocessing:

3. Models Training:

4. Visualization:

5. Predictions Output:

🎯 Objectives

🧾 Modelling Structure

🤖 Understanding the Hybrid Models

💻 What a User Can Do With It:

🧩 5. How to Integrate It With Your System

📂 Expected Input Data

Expected Columns:

📊 Evaluation Strategy:

🧩 Model Development Process:

1. Data Preprocessing

2. Model Building

3. Visualization

4. Future Hybrid Models

📊 Models Performance

📚 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages