A statistical simulation project comparing PLS-VIP (Partial Least Squares - Variable Importance in Projection) and RWA (Relative Weight Analysis) methods for identifying important predictors in regression models.
Comparative Analysis of PLS-VIP and RWA Methods: A Comprehensive Simulation Study
This repository contains the complete implementation and reproducible code for our research paper that presents a comprehensive comparison between PLS-VIP and RWA methods through extensive simulation studies. The paper investigates method performance across various conditions including different sample sizes, predictor counts, correlation structures, and data types, providing practical guidelines for method selection in applied research.
Key findings from the paper:
- Significant performance differences between methods in high-dimensional scenarios
- Impact of correlation structures on method effectiveness
- Practical recommendations for method selection based on data characteristics
- Comprehensive statistical analysis with ANOVA and cross-tabulation results
- 📄 Research Paper
- Overview
- Quick Start
- Parameter Space
- Architecture
- Output Files
- Methods
- Reproducing Paper Results
This simulation study evaluates two competing methods for variable importance assessment:
- PLS-VIP: Uses cross-decomposition to score variable importance
- RWA: Johnson's Relative Weight Analysis using transformation matrices
The simulation tests performance across multiple parameter combinations using Monte Carlo experiments to determine which method performs better under different conditions.
pip install -r requirements.txtpython main.pyThis will:
- Run 100 replications across multiple parameter combinations
- Save results to
simulation_results.csv - Display performance analysis
- Generate data that supports the findings in our research paper
The simulation tests across:
- Sample sizes: 100-500
- Number of predictors: 10-50
- Correlation levels: 0.1-0.5
- Effect magnitudes: 0.1-0.5
- Noise ratios: 0.1-0.5
- Importance proportion α: 0.3 (default)
factors.py: Statistical core implementing both methods and data generationmain.py: Simulation engine orchestrating Monte Carlo experimentsvisualization.py: Performance visualization utilitieslogger.py: Logging configuration
simulation_results.csv: Raw simulation datadetailed_results.csv: Aggregated performance metricsfractional_factorial_results.csv: Fractional factorial design resultscomparison_plot.png&performance_comparison.png: Visual comparisonsdocs/: Analysis summaries and additional figurespaper/: Complete research paper with LaTeX source, figures, and tables
Key parameters in main.py:
alpha: Proportion of variables that are important (default 0.3)randomize_important: Whether to randomize important variables across replications- Results are saved incrementally to prevent data loss
Uses partial least squares regression to decompose the predictor space and calculates Variable Importance in Projection scores based on the contribution of each variable to the latent components.
Johnson's method that uses transformation matrices to partition the variance in the outcome variable among correlated predictors, providing relative importance weights.
To reproduce the exact results presented in the research paper:
-
Run the full simulation study:
python main.py
-
Generate statistical analysis:
python statistical_analysis.py
-
Create visualizations:
python visualize_results.py
-
Reproduce all analyses at once:
bash reproduce_results.sh
The paper's figures and tables are automatically generated from the simulation results and saved in the docs/ directory. All statistical analyses, including ANOVA results and cross-tabulations presented in the paper, can be reproduced using the provided scripts.
Note: The full simulation may take several hours to complete. For quick testing, modify the n_reps parameter in main.py to a smaller value (e.g., 10-50 replications).
If you use this code or findings in your research, please cite our paper:
@article{marinucci2024plsvip,
title={Comparative Analysis of PLS-VIP and RWA Methods: A Comprehensive Simulation Study},
author={Marinucci, Massimiliano},
year={2024},
url={https://github.com/m-marinucci/PLS_RWA}
}This project is for research and educational purposes.