Releases: ombhojane/explainableai
# [0.10] Performance Optimization, Enhanced Preprocessing, and much more!
ExplainableAI v0.10 introduces significant performance improvements, enhanced data preprocessing capabilities, and a more robust logging system.
New Features
Dask Integration for Large Datasets
- Added support for Dask DataFrames to handle larger-than-memory datasets efficiently.
- Implemented
_preprocess_data_dask
method for parallel data preprocessing.
Enhanced analyze
Function
- Added support for batch processing and parallel execution:
batch_size
: Allows processing of large datasets in smaller chunks. Default is None (process all data at once).parallel
: Enables parallel processing of batches using multiprocessing. Default is False.instance_index
: Specifies the index of a particular instance for detailed interpretation. Default is 0.
Enhanced Logging
- Implemented a more comprehensive logging system using Python's
logging
module. - Added colorized console output for better readability using the
colorama
library.
Expanded Documentation
- Created a new
/doc
directory for additional documentation:- API reference guide
- User guide with detailed explanations and best practices
- Installation and setup instructions
Use cases
- Added an
/examples
directory showcasing various use cases:- Small code snippets for quick start
- Comprehensive examples of ExplainableAI in larger projects
- Jupyter notebooks demonstrating step-by-step analysis
Improvements
Core Functionality
- Refactored
XAIWrapper
class for improved performance and modularity. - Enhanced error handling and added more informative error messages.
Data Preprocessing
- Improved categorical and numerical feature handling in the preprocessing pipeline.
- Added support for handling missing values and outliers.
Model Comparison
- Enhanced model comparison functionality with more detailed metrics.
- Improved selection of the best model based on cross-validation scores.
Visualization
- Added new visualization options, including correlation heatmaps.
- Improved existing plots for better interpretability.
Report Generation
- Enhanced PDF report generation with more customizable options.
- Added ability to selectively include sections in the generated report.
Exploratory Data Analysis (EDA)
- Implemented a new
perform_eda
method inXAIWrapper
for quick dataset insights. - Added correlation analysis and outlier detection to EDA process.
Bug Fixes
- Fixed issues related to feature importance calculation and visualization.
- Resolved compatibility issues with the latest versions of dependencies.
Performance Optimization
- Implemented more efficient data handling techniques for large datasets.
- Optimized SHAP value calculations and other computationally intensive operations.
Installation
pip install explainableai==0.10
Usage
from explainableai import XAIWrapper
import pandas as pd
# Load your dataset
df = pd.read_csv('your_dataset.csv')
X = df.drop(columns=['target_column'])
y = df['target_column']
# Initialize XAIWrapper
xai = XAIWrapper()
# Fit and analyze models
xai.fit(models, X, y)
results = xai.analyze(batch_size=100, parallel=False, instance_index=0)
# Generate a comprehensive report
xai.generate_report('analysis_report.pdf')
# Make and explain predictions
new_data = {...} # Dictionary of feature values
prediction, probabilities, explanation = xai.explain_prediction(new_data)
Analyze with batch processing and parallel execution
This will:
- Process the data in batches of 1000 samples
- Use parallel processing for faster computation
- Provide detailed interpretation for the 43rd instance (0-based index)
xai = XAIWrapper()
xai.fit(models, X, y)
results = xai.analyze(batch_size=1000, parallel=True, instance_index=42)
Breaking Changes
- The
analyze
method now supports batch processing and parallel execution options. - Some internal method signatures have been updated to accommodate new features.
We encourage users to update to this version for improved performance and new capabilities. As always, please report any issues or suggestions through our GitHub issue tracker.
For more detailed information, please refer to the documentation in the /doc
directory and explore the explainableai usecases in the /examples
directory.
[0.1.6] Multi-Model Support, Enhanced Visualizations, and Improved Testing
ExplainableAI v0.1.6
ExplainableAI introduces significant enhancements, including support for multiple model types, improved visualizations, and a more robust testing framework.
New Features
Multi-Model Support
- Now supports multiple model types including Random Forest, Logistic Regression, XGBoost, and Neural Networks.
- Automatically compares and selects the best-performing model based on cross-validation scores.
Enhanced Model Comparison
- Generates comparative analysis of all provided models.
- Includes ROC curves and performance metrics for each model type.
Improved Visualizations
- Enhanced feature importance plots with better readability.
- Interactive visualizations using Plotly for more engaging data exploration.
Robust Testing Framework
- Comprehensive test suite covering all major functionalities.
- Parameterized tests to ensure compatibility with various model types.
- Edge case handling and input validation tests.
Improvements
Core Functionality
- Refactored XAIWrapper class for better handling of multiple models.
- Improved data preprocessing pipeline with enhanced categorical variable handling.
Report Generation
- Added model comparison section to the PDF report.
- Improved layout and formatting for better readability.
LLM Integration
-Updated prompts for Gemini model to provide more insightful explanations of multi-model results.
Performance Optimization
- Improved efficiency in handling large datasets.
- Optimized SHAP value calculations for faster processing.
Bug Fixes
- Fixed issues with feature importance calculation for certain model types.
- Resolved compatibility issues with the latest scikit-learn version.
- Corrected error handling in prediction explanation function.
Introducing Notebooks
- Added colab codes of explainableai package usage
- Free feel to use it with for your datasets and preferred models
- head to explaianbleai/notebooks for access
#Installation
pip install explainableai==0.1.6
Usage
from explainableai import XAIWrapper
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.neural_network import MLPClassifier
# Initialize your models
models = {
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
'Logistic Regression': LogisticRegression(max_iter=1000),
'XGBoost': XGBClassifier(n_estimators=100, random_state=42),
'Neural Network': MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=1000, random_state=42)
}
# Create XAIWrapper instance
xai = XAIWrapper()
# Fit the models and run analysis
xai.fit(models, X_train, y_train)
results = xai.analyze()
# Generate report
xai.generate_report()
# Make and explain predictions
prediction, probabilities, explanation = xai.explain_prediction(input_data)
Breaking Changes
- The fit method now requires a dictionary of models instead of a single model.
- Some visualization function signatures have been updated to accommodate multiple models.
We encourage users to update to this version for access to these new features and improvements. As always, please report any issues or suggestions through our GitHub issue tracker.
[0.1.4] Offers seamless integration with explainableai, get reports of results, and much more...
ExplainableAI
ExplainableAI is a Python package that provides a comprehensive suite of tools for explainable machine learning. It wraps around popular machine learning models and offers various techniques to interpret and explain their predictions.
Features
- Model Agnostic: Works with any scikit-learn compatible model.
- Automated EDA: Performs exploratory data analysis on the input dataset.
- Feature Importance: Calculates and visualizes feature importance.
- SHAP Values: Computes SHAP (SHapley Additive exPlanations) values for in-depth feature impact analysis.
- Partial Dependence Plots: Generates partial dependence plots for top features.
- Learning Curve: Plots learning curves to assess model performance with varying training set sizes.
- ROC and Precision-Recall Curves: For classification tasks, generates ROC and Precision-Recall curves.
- Correlation Heatmap: Visualizes feature correlations.
- Cross-Validation: Performs k-fold cross-validation.
- LLM-powered Explanations: Utilizes Google's Gemini model to provide natural language explanations of model results and individual predictions.
- PDF Report Generation: Automatically generates a comprehensive PDF report of all analyses.
- Interactive Predictions: Allows users to input data and receive explained predictions.
Implementation Details
Core Components
-
XAIWrapper: The main class that encapsulates all functionality.
- Handles data preprocessing, model fitting, and various analyses.
- Integrates all explainability techniques.
-
ReportGenerator: Generates PDF reports using ReportLab.
- Creates structured reports with text, tables, and visualizations.
-
Visualization Module: Contains functions for creating various plots and visualizations.
- Uses matplotlib and seaborn for static visualizations.
-
Model Evaluation: Includes functions for assessing model performance.
- Computes metrics like accuracy, F1-score, MSE, R2, etc.
-
Feature Analysis: Implements feature importance and SHAP value calculations.
-
LLM Integration: Uses Google's Gemini model for natural language explanations.
- Interprets model results and individual predictions.
Key Files
core.py
: Contains the XAIWrapper class.report_generation.py
: Implements the ReportGenerator class.visualizations.py
: Houses all visualization functions.model_evaluation.py
: Contains model evaluation metrics.feature_analysis.py
: Implements feature importance and SHAP calculations.llm_explanations.py
: Handles integration with the Gemini model.
Workflow
-
Data Preprocessing:
- Handles categorical and numerical features.
- Performs imputation and scaling.
-
Model Fitting:
- Fits the provided model to the preprocessed data.
-
Analysis:
- Calculates feature importance.
- Generates various visualizations.
- Computes SHAP values.
- Performs cross-validation.
-
LLM Explanation:
- Sends analysis results to Gemini for interpretation.
- Generates natural language explanations.
-
Report Generation:
- Compiles all analyses and visualizations into a PDF report.
-
Interactive Predictions:
- Allows users to input data for new predictions.
- Provides explanations for individual predictions.
Installation
pip install explainableai
Usage
from explainableai import XAIWrapper
from sklearn.ensemble import RandomForestClassifier
# Initialize your model
model = RandomForestClassifier()
# Create XAIWrapper instance
xai = XAIWrapper()
# Fit the model and run analysis
xai.fit(model, X_train, y_train)
results = xai.analyze()
# Generate report
xai.generate_report()
# Make and explain predictions
prediction, probabilities, explanation = xai.explain_prediction(input_data)