A comprehensive data visualization tool with a modern graphical user interface for applying dimensionality reduction and clustering algorithms to CSV datasets.
Data Visualizer 3.0 provides an intuitive interface for exploring high-dimensional data through various machine learning algorithms. The application supports Principal Component Analysis (PCA), K-Means clustering, and multiple variants of Locally Linear Embedding (LLE) for dimensionality reduction.
- Graphical User Interface: Modern Tkinter-based GUI with clean, professional design
- Multiple Algorithms: Support for PCA, K-Means, and LLE variants (Standard, Modified, Hessian, LTSA)
- Flexible Data Input: Automatically detects and adapts to different CSV file structures
- Visualization: Clean, readable scatter plots with color-coded clusters
- Batch Processing: Option to run all algorithms simultaneously
- PCA (Principal Component Analysis): Linear dimensionality reduction technique
- Standard LLE: Locally Linear Embedding for nonlinear dimensionality reduction
- Modified LLE: Improved regularization variant of LLE
- Hessian LLE: Hessian-based Locally Linear Embedding using Hessian eigenmaps
- LTSA LLE: Local Tangent Space Alignment LLE that preserves local geometry
- K-Means: K-Means clustering algorithm for data segmentation
- PCA + K-Means: Combined dimensionality reduction followed by clustering
- Python 3.8 or higher
- Tkinter (usually included with Python)
- Required packages (see requirements.txt):
- pandas
- numpy
- matplotlib
- scikit-learn
- Clone the repository:
git clone <repository-url>
cd DataVisualizer3.0- Create a virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate # On macOS/Linux
# or
venv\Scripts\activate # On Windows- Install dependencies:
pip install -r requirements.txtOption 1: Using the shell script (macOS/Linux)
./run.shOption 2: Direct execution
source venv/bin/activate
python3 GUI.pyOption 3: On Windows
venv\Scripts\activate
python GUI.py- Upload Data: Click "Browse Files" and select your CSV file
- Select Algorithm: Choose from the dropdown menu:
- None (raw data visualization)
- PCA
- K-Means
- PCA And K-Means
- Standard LLE
- Modified LLE
- Hessian LLE
- LTSA LLE
- Display All (runs all algorithms)
- Run Visualization: Click "RUN VISUALIZATION" to execute the selected algorithm
- View Results: Matplotlib windows will open displaying the results
The application automatically detects numeric columns in your CSV file. Supported formats:
- CSV files with numeric columns
- Any number of columns (the application adapts automatically)
- Column names are preserved when possible
DataVisualizer3.0/
├── GUI.py # Main graphical user interface
├── PCA_Algo.py # PCA implementation
├── PCA_Runner.py # PCA execution wrapper
├── kMeans.py # K-Means clustering
├── kMeansAndPCA.py # Combined PCA and K-Means
├── none.py # Raw data visualization
├── Standard_LLE.py # Standard LLE implementation
├── Modified_LLE.py # Modified LLE implementation
├── Hessian_LLE.py # Hessian LLE implementation
├── LTSA_LLE.py # LTSA LLE implementation
├── displayAll.py # Batch algorithm execution
├── requirements.txt # Python dependencies
├── run.sh # Execution script
└── README.md # This file
- GUI Framework: Tkinter with custom styling
- Data Processing: Pandas for data manipulation
- Visualization: Matplotlib with TkAgg backend
- Machine Learning: Scikit-learn for algorithms
- Color-coded scatter plots for categorical data
- Colormap visualization for continuous data
- Minimal label overlap (maximum 20 labels per plot)
- Grid and axis labels for clarity
- Responsive plotting with proper scaling
On macOS, if you encounter Tkinter errors:
brew install python-tkEnsure your virtual environment is activated and dependencies are installed:
source venv/bin/activate
pip install -r requirements.txt- Ensure matplotlib is using the TkAgg backend (set automatically)
- Check that your CSV file contains numeric data
- Verify the file was uploaded successfully
This project is provided as-is for educational and research purposes.
Contributions are welcome. Please ensure code follows the existing style and includes appropriate error handling.
Data Visualizer 3.0 - Developed for CSC805 Data Visualization course