Welcome to the comprehensive tutorial series for PlotSenseAI! These tutorials will guide you through each demo project step-by-step, helping you understand both the code and the concepts behind AI-powered data visualization.
By completing these tutorials, you will:
- Master PlotSenseAI's core functionality
- Understand ML explainability techniques
- Learn to build custom data visualization plugins
- Create interactive web applications for data storytelling
- Apply best practices for AI-driven data analysis
Each tutorial includes:
- π― Objectives: What you'll learn
- βοΈ Prerequisites: Required knowledge and setup
- π£ Step-by-step guide: Detailed instructions
- π§ͺ Exercises: Hands-on practice
- π§ Troubleshooting: Common issues and solutions
- π Next steps: How to extend and improve
- Load and preprocess real-world datasets
- Train machine learning models
- Use PlotSenseAI for automated visualization recommendations
- Generate and interpret AI explanations
- Explore advanced explainability techniques
- Basic Python knowledge
- Understanding of pandas and scikit-learn
- Jupyter Notebook setup (see SETUP.md)
cd project_one
pip install ucimlrepo scikit-learn pandas matplotlib plotsense
jupyter notebook ml_explainability_demo.ipynbThe UCI Breast Cancer Recurrence dataset contains:
- Features: Age, menopause status, tumor size, etc.
- Target: Recurrence (no-recurrence-events vs recurrence-events)
- Challenge: Imbalanced classes and categorical features
# Load the dataset
from ucimlrepo import fetch_ucirepo
breast_cancer_recurrence = fetch_ucirepo(id=14)
X = breast_cancer_recurrence.data.features
y = breast_cancer_recurrence.data.targets# Handle missing values
X_cleaned = X.fillna(X.mode().iloc[0])
# Encode categorical variables
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for col in X_cleaned.select_dtypes(include=['object']).columns:
X_cleaned[col] = le.fit_transform(X_cleaned[col])from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_cleaned, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)from plotsense import recommender, plotgen, explainer
# Get visualization recommendations
recommendations = recommender(X_train, n=5)
print("PlotSenseAI Recommendations:")
print(recommendations)
# Generate visualization
plot = plotgen(X_train, 0, recommendations) # Use first recommendation
plot.show()
# Get AI explanation
explanation = explainer(plot)
print("AI Explanation:", explanation)# Feature importance
feature_importance = pd.DataFrame({
'feature': X_train.columns,
'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
# Visualize with PlotSenseAI
importance_plot_recs = recommender(feature_importance, n=3)
importance_plot = plotgen(feature_importance, 0, importance_plot_recs)
importance_plot.show()-
Data Exploration:
- Try different datasets from UCI repository
- Experiment with various preprocessing techniques
- Compare PlotSenseAI recommendations for different data types
-
Model Comparison:
- Train different models (SVM, Logistic Regression, XGBoost)
- Use PlotSenseAI to visualize model performance comparisons
- Generate explanations for each model's behavior
-
Advanced Explainability:
- Implement SHAP values visualization
- Create partial dependence plots
- Explore feature interaction effects
Issue: UCI dataset not loading
# Alternative: Use built-in datasets
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
X, y = data.data, data.targetIssue: PlotSenseAI recommendations seem irrelevant
- Check data types and ensure proper preprocessing
- Try different subsets of your data
- Experiment with different
nvalues for recommendations
- Create modular Python packages
- Implement statistical anomaly detection
- Integrate custom functionality with PlotSenseAI
- Write comprehensive unit tests
- Package and distribute Python modules
- Python packaging knowledge
- Understanding of statistical concepts (Z-score, standard deviation)
- Basic testing with pytest
cd project_two
ls -la # Observe the package structureThe structure follows Python packaging best practices:
plotsense_anomaly/
βββ __init__.py # Package initialization
βββ detection.py # Core anomaly detection logic
βββ visualization.py # PlotSenseAI integration
# File: plotsense_anomaly/detection.py
def zscore_anomaly_detection(data, threshold=1.5):
"""
Z-score based anomaly detection
Anomaly if: |Z-score| > threshold
Z-score = (value - mean) / standard_deviation
"""
df = pd.DataFrame({"value": data})
mean = df["value"].mean()
std = df["value"].std()
df["zscore"] = (df["value"] - mean) / std
df["anomaly"] = np.abs(df["zscore"]) > threshold
return df# File: plotsense_anomaly/visualization.py
from plotsense import recommender, plotgen
def visualize_anomalies(data, anomalies):
# Create visualization dataset
viz_data = pd.DataFrame({
'value': data,
'anomaly': anomalies,
'index': range(len(data))
})
# Get PlotSenseAI recommendations
recommendations = recommender(viz_data, n=3)
# Generate plot
plot = plotgen(viz_data, 0, recommendations)
return plotpython examples/demo_anomaly_detection.pyExpected output:
Detected 3 anomalies out of 100 data points
Anomalous values: [45.2, -32.1, 67.8]
[PlotSenseAI visualization appears]
python -m pytest tests/test_detection.py -v-
Algorithm Enhancement:
- Implement IQR-based anomaly detection
- Add support for multivariate anomaly detection
- Create ensemble methods combining multiple techniques
-
Visualization Improvements:
- Add color coding for different anomaly types
- Create interactive hover information
- Implement time-series specific visualizations
-
Package Extension:
- Add configuration files for different detection parameters
- Create CLI interface for the package
- Add support for streaming data
Issue: Import errors when running examples
# Install package in development mode
pip install -e .Issue: Tests failing
- Check that all dependencies are installed
- Verify Python path includes the project directory
- Run tests with more verbose output:
pytest -v -s
- Build responsive web applications with Streamlit
- Create interactive data exploration interfaces
- Integrate multiple PlotSenseAI features
- Handle user input and API key management
- Deploy data applications
- Basic web development concepts
- Streamlit framework basics
- Understanding of API integration
# File: app.py - Key components
# Data Loading (with caching)
@st.cache_data
def load_data():
return pd.read_csv("data/climate.csv")
# Sidebar Controls
city = st.sidebar.selectbox("Select City", df["City"].unique())
variable = st.sidebar.selectbox("Select Variable", ["Temperature", "Humidity"])
# Main Content
recommendations = recommender(filtered_data, n=3)
plot = plotgen(filtered_data, choice, recommendations)
explanation = explainer(plot)cd project_three
pip install -r requirements.txt
streamlit run app.pyNavigate to http://localhost:8501 in your browser.
Sidebar Components:
- API Key Input (hidden/password type)
- City Selection (dropdown)
- Variable Selection (dropdown)
- Number of recommendations (slider)
- Raw data toggle (checkbox)
Main Content:
- Data preview table
- PlotSenseAI recommendations table
- Interactive visualization
- AI-generated explanations
- User selects parameters in sidebar
- Data gets filtered based on selections
- PlotSenseAI generates recommendations
- User chooses a recommendation
- Visualization is generated and displayed
- AI explanation is generated (if API key provided)
Adding New Variables:
# In the sidebar section
new_variable = st.sidebar.selectbox(
"Select New Variable",
["Temperature", "Humidity", "Wind Speed", "Rainfall", "Pressure"]
)Custom Filtering:
# Add date range filter
date_range = st.sidebar.date_input(
"Select Date Range",
value=[df["Date"].min(), df["Date"].max()],
min_value=df["Date"].min(),
max_value=df["Date"].max()
)
# Filter data
filtered_data = df[
(df["Date"] >= pd.to_datetime(date_range[0])) &
(df["Date"] <= pd.to_datetime(date_range[1]))
]-
UI Enhancement:
- Add multiple city selection
- Implement data export functionality
- Create comparison views between cities
-
Advanced Features:
- Add real-time data updates
- Implement user authentication
- Create dashboard with multiple charts
-
Deployment:
- Deploy to Streamlit Cloud
- Create Docker container
- Set up environment variables for production
Issue: App not loading data
- Check that
data/climate.csvexists - Verify file path in
load_data()function - Ensure data file has expected columns
Issue: PlotSenseAI not working
- Verify plotsense installation:
pip show plotsense - Check for API key requirements
- Test PlotSenseAI in isolation first
Project Integration Example:
# Combine anomaly detection with web app
from plotsense_anomaly import zscore_anomaly_detection
from plotsense import recommender, plotgen
# In your Streamlit app
anomalies = zscore_anomaly_detection(data, threshold=2.0)
anomaly_recs = recommender(anomalies, n=5)
anomaly_plot = plotgen(anomalies, 0, anomaly_recs)
st.pyplot(anomaly_plot)Data Caching:
@st.cache_data(ttl=3600) # Cache for 1 hour
def expensive_computation(data):
return processed_dataLazy Loading:
if st.button("Generate Advanced Analysis"):
with st.spinner("Computing..."):
result = complex_analysis(data)
st.success("Analysis complete!")- Environment Variables: Use for API keys and configuration
- Error Handling: Implement comprehensive try-catch blocks
- Logging: Add logging for debugging and monitoring
- Testing: Create integration tests for web components
- Security: Validate user inputs and sanitize data
After completing these tutorials:
- Contribute: Submit improvements to the demo projects
- Create: Build your own PlotSenseAI applications
- Share: Present your work at the hackathon
- Learn: Explore advanced PlotSenseAI features
- Connect: Join the PlotSenseAI community
- PlotSenseAI API Documentation
- Streamlit Tutorials
- scikit-learn User Guide
- Pandas Documentation
- Python Packaging Guide
Happy Learning! π