Skip to content

iremozturk/Data-visualizer

Repository files navigation

Data Visualizer 3.0

A comprehensive data visualization tool with a modern graphical user interface for applying dimensionality reduction and clustering algorithms to CSV datasets.

Overview

Data Visualizer 3.0 provides an intuitive interface for exploring high-dimensional data through various machine learning algorithms. The application supports Principal Component Analysis (PCA), K-Means clustering, and multiple variants of Locally Linear Embedding (LLE) for dimensionality reduction.

Features

  • Graphical User Interface: Modern Tkinter-based GUI with clean, professional design
  • Multiple Algorithms: Support for PCA, K-Means, and LLE variants (Standard, Modified, Hessian, LTSA)
  • Flexible Data Input: Automatically detects and adapts to different CSV file structures
  • Visualization: Clean, readable scatter plots with color-coded clusters
  • Batch Processing: Option to run all algorithms simultaneously

Algorithms Supported

Dimensionality Reduction

  • PCA (Principal Component Analysis): Linear dimensionality reduction technique
  • Standard LLE: Locally Linear Embedding for nonlinear dimensionality reduction
  • Modified LLE: Improved regularization variant of LLE
  • Hessian LLE: Hessian-based Locally Linear Embedding using Hessian eigenmaps
  • LTSA LLE: Local Tangent Space Alignment LLE that preserves local geometry

Clustering

  • K-Means: K-Means clustering algorithm for data segmentation
  • PCA + K-Means: Combined dimensionality reduction followed by clustering

Requirements

  • Python 3.8 or higher
  • Tkinter (usually included with Python)
  • Required packages (see requirements.txt):
    • pandas
    • numpy
    • matplotlib
    • scikit-learn

Installation

  1. Clone the repository:
git clone <repository-url>
cd DataVisualizer3.0
  1. Create a virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate  # On macOS/Linux
# or
venv\Scripts\activate  # On Windows
  1. Install dependencies:
pip install -r requirements.txt

Usage

Running the Application

Option 1: Using the shell script (macOS/Linux)

./run.sh

Option 2: Direct execution

source venv/bin/activate
python3 GUI.py

Option 3: On Windows

venv\Scripts\activate
python GUI.py

Using the Application

  1. Upload Data: Click "Browse Files" and select your CSV file
  2. Select Algorithm: Choose from the dropdown menu:
    • None (raw data visualization)
    • PCA
    • K-Means
    • PCA And K-Means
    • Standard LLE
    • Modified LLE
    • Hessian LLE
    • LTSA LLE
    • Display All (runs all algorithms)
  3. Run Visualization: Click "RUN VISUALIZATION" to execute the selected algorithm
  4. View Results: Matplotlib windows will open displaying the results

Data Format

The application automatically detects numeric columns in your CSV file. Supported formats:

  • CSV files with numeric columns
  • Any number of columns (the application adapts automatically)
  • Column names are preserved when possible

Project Structure

DataVisualizer3.0/
├── GUI.py                 # Main graphical user interface
├── PCA_Algo.py           # PCA implementation
├── PCA_Runner.py         # PCA execution wrapper
├── kMeans.py             # K-Means clustering
├── kMeansAndPCA.py       # Combined PCA and K-Means
├── none.py               # Raw data visualization
├── Standard_LLE.py       # Standard LLE implementation
├── Modified_LLE.py       # Modified LLE implementation
├── Hessian_LLE.py        # Hessian LLE implementation
├── LTSA_LLE.py           # LTSA LLE implementation
├── displayAll.py         # Batch algorithm execution
├── requirements.txt      # Python dependencies
├── run.sh                # Execution script
└── README.md             # This file

Technical Details

Architecture

  • GUI Framework: Tkinter with custom styling
  • Data Processing: Pandas for data manipulation
  • Visualization: Matplotlib with TkAgg backend
  • Machine Learning: Scikit-learn for algorithms

Visualization Features

  • Color-coded scatter plots for categorical data
  • Colormap visualization for continuous data
  • Minimal label overlap (maximum 20 labels per plot)
  • Grid and axis labels for clarity
  • Responsive plotting with proper scaling

Troubleshooting

Tkinter Not Found

On macOS, if you encounter Tkinter errors:

brew install python-tk

ModuleNotFoundError

Ensure your virtual environment is activated and dependencies are installed:

source venv/bin/activate
pip install -r requirements.txt

Visualization Not Displaying

  • Ensure matplotlib is using the TkAgg backend (set automatically)
  • Check that your CSV file contains numeric data
  • Verify the file was uploaded successfully

License

This project is provided as-is for educational and research purposes.

Contributing

Contributions are welcome. Please ensure code follows the existing style and includes appropriate error handling.

Author

Data Visualizer 3.0 - Developed for CSC805 Data Visualization course

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published