Welcome to the comprehensive demo walkthroughs! These guides will take you step-by-step through each demo project, showing you exactly what to expect and how to interact with the applications.
Each walkthrough includes:
- 📖 Overview: What the demo does
- ⚡ Quick Start: Get running in 2 minutes
- 🎮 Interactive Guide: Step-by-step usage
- 🎯 Key Features: What to focus on
- 🧪 Experiments: Things to try
- 🔧 Customization: How to modify and extend
This Jupyter notebook demonstrates how PlotSenseAI can make machine learning models more interpretable by automatically generating visualizations and explanations for model predictions.
cd project_one
pip install ucimlrepo scikit-learn pandas matplotlib plotsense
jupyter notebook ml_explainability_demo.ipynbOpen your browser to http://localhost:8888 and click on ml_explainability_demo.ipynb.
When you run the first few cells, you'll see:
# Cell 1-2: Data Loading
from ucimlrepo import fetch_ucirepo
breast_cancer_recurrence = fetch_ucirepo(id=14)What happens: Downloads the UCI Breast Cancer Recurrence dataset Look for:
- Dataset shape and size
- Feature names and types
- Missing value patterns
# Cell 3-4: Initial Exploration
print(f"Dataset shape: {X.shape}")
print(f"Features: {list(X.columns)}")
print(f"Target distribution: {y.value_counts()}")Expected Output:
Dataset shape: (286, 9)
Features: ['age', 'menopause', 'tumor-size', 'inv-nodes', ...]
Target distribution:
no-recurrence-events 201
recurrence-events 85
Key Insight: Notice the class imbalance - this is a real-world challenge!
# Cell 5-6: Cleaning and Encoding
X_cleaned = X.fillna(X.mode().iloc[0])
for col in X_cleaned.select_dtypes(include=['object']).columns:
X_cleaned[col] = le.fit_transform(X_cleaned[col])Watch for:
- How categorical variables get encoded
- Missing value handling strategy
- Data type transformations
# Cell 7-8: First PlotSenseAI Usage
from plotsense import recommender, plotgen, explainer
recommendations = recommender(X_cleaned, n=5)
print("PlotSenseAI Recommendations:")
display(recommendations)Expected Output:
Recommendation Confidence Chart_Type
0 Feature correlation heatmap 0.92 heatmap
1 Distribution comparison 0.87 boxplot
2 Feature importance ranking 0.83 barplot
3 Scatter plot matrix 0.78 scatter
4 Violin plot comparison 0.71 violin
💡 Key Point: PlotSenseAI automatically analyzes your data and suggests the most relevant visualizations!
# Cell 9-10: ML Model
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)
print(f"Model accuracy: {accuracy:.3f}")Expected Output: Model accuracy: 0.754
# Cell 11-12: Generate Plot
plot = plotgen(X_train, 0, recommendations) # Use first recommendation
plot.show()What you'll see: An automatically generated heatmap showing feature correlations, with:
- Professional styling and color schemes
- Proper axis labels and titles
- Clear correlation patterns highlighted
# Cell 13: Get Explanation
explanation = explainer(plot)
print("AI Explanation:")
print(explanation)Sample Output:
"This correlation heatmap reveals important relationships in the breast cancer dataset.
Strong positive correlations appear between tumor-size and inv-nodes (0.67), suggesting
larger tumors are associated with more invasive nodes. The age feature shows weak
correlations with other variables, indicating it may be less predictive..."
# Cell 14-15: Model Explainability
feature_importance = pd.DataFrame({
'feature': X_train.columns,
'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
# Use PlotSenseAI for feature importance visualization
importance_recs = recommender(feature_importance, n=3)
importance_plot = plotgen(feature_importance, 0, importance_recs)
importance_plot.show()What you'll discover:
- Which features most influence model predictions
- How PlotSenseAI adapts recommendations to different data types
- Clear visual hierarchy of feature importance
Try changing the data subset:
# Try with different feature subsets
numeric_only = X_train.select_dtypes(include=[np.number])
categorical_only = X_train.select_dtypes(include=['object'])
recs_numeric = recommender(numeric_only, n=3)
recs_categorical = recommender(categorical_only, n=3)Observation: Notice how recommendations change based on data types!
# Try different recommendation indices
for i in range(len(recommendations)):
print(f"\\n--- Visualization {i+1}: {recommendations.iloc[i]['Recommendation']} ---")
plot = plotgen(X_train, i, recommendations)
plot.show()
explanation = explainer(plot)
print(f"Explanation: {explanation}")# Compare different models
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
models = {
'Random Forest': RandomForestClassifier(random_state=42),
'Logistic Regression': LogisticRegression(random_state=42),
'SVM': SVC(random_state=42)
}
for name, model in models.items():
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print(f"{name}: {score:.3f}")- Different Datasets: Replace with other UCI datasets
- Feature Engineering: Create new features and see how recommendations change
- Model Types: Try deep learning models and compare explanations
- Custom Thresholds: Experiment with different confidence thresholds for recommendations
This demo shows how to extend PlotSenseAI with custom functionality by building an anomaly detection plugin that integrates seamlessly with PlotSenseAI's visualization engine.
cd project_two
pip install -r requirements.txt
python examples/demo_anomaly_detection.pyFile Structure Overview:
plotsense_anomaly/
├── __init__.py # Package initialization
├── detection.py # Core anomaly detection logic
└── visualization.py # PlotSenseAI integration
Key Concept: Modular design allows easy extension and testing
Open plotsense_anomaly/detection.py:
def zscore_anomaly_detection(data, threshold=1.5):
"""
Z-score based anomaly detection
Anomaly if: |Z-score| > threshold
"""
df = pd.DataFrame({"value": data})
mean = df["value"].mean()
std = df["value"].std()
df["zscore"] = (df["value"] - mean) / std
df["anomaly"] = np.abs(df["zscore"]) > threshold
return dfUnderstanding Z-score:
- Measures how many standard deviations away from the mean
- threshold=1.5 means values 1.5+ std devs away are anomalies
- Common thresholds: 1.5 (moderate), 2.0 (standard), 3.0 (conservative)
python examples/demo_anomaly_detection.pyExpected Output:
🔍 PlotSense Anomaly Detection Demo
📊 Generated 100 data points with intentional anomalies
📈 Data range: [-2.45, 45.23]
📊 Data statistics:
Mean: 10.23
Std Dev: 8.67
🚨 Anomaly Detection Results:
Total anomalies detected: 7
Anomaly rate: 7.0%
🎯 Anomalous values:
Index 23: 45.23 (Z-score: 4.04)
Index 67: -2.45 (Z-score: -1.47)
Index 89: 38.91 (Z-score: 3.31)
...
📊 Generating PlotSenseAI visualization...
What happens next: A visualization window opens showing:
- Scatter plot of all data points
- Anomalies highlighted in red
- Normal points in blue
- Clear threshold boundaries
Open plotsense_anomaly/visualization.py:
def visualize_anomalies(data, anomalies):
viz_data = pd.DataFrame({
'value': data,
'anomaly': anomalies,
'index': range(len(data))
})
recommendations = recommender(viz_data, n=3)
plot = plotgen(viz_data, 0, recommendations)
return plotKey Integration Points:
- Data Preparation: Structures data for PlotSenseAI
- Recommendation: Gets visualization suggestions
- Generation: Creates the actual plot
python -m pytest tests/test_detection.py -vExpected Output:
tests/test_detection.py::test_zscore_basic ✓
tests/test_detection.py::test_zscore_threshold ✓
tests/test_detection.py::test_zscore_edge_cases ✓
tests/test_detection.py::test_zscore_empty_data ✓
====== 4 passed in 0.23s ======
What's being tested:
- Basic functionality with normal data
- Different threshold values
- Edge cases (single value, all same values)
- Error handling (empty data)
import numpy as np
from plotsense_anomaly import zscore_anomaly_detection
# Generate test data
np.random.seed(42)
data = np.random.normal(0, 1, 100)
data = np.append(data, [5, -5, 6]) # Add obvious anomalies
# Test different thresholds
thresholds = [1.0, 1.5, 2.0, 2.5, 3.0]
for threshold in thresholds:
result = zscore_anomaly_detection(data, threshold)
anomaly_count = result['anomaly'].sum()
print(f"Threshold {threshold}: {anomaly_count} anomalies")Expected Pattern: Higher thresholds → fewer anomalies detected
# Simulate streaming data
import time
import matplotlib.pyplot as plt
def streaming_anomaly_demo():
data_stream = []
for i in range(50):
# Normal data with occasional anomalies
if i % 10 == 0:
new_point = np.random.normal(0, 1) * 5 # Anomaly
else:
new_point = np.random.normal(0, 1) # Normal
data_stream.append(new_point)
if len(data_stream) >= 10: # Need minimum data for stats
result = zscore_anomaly_detection(data_stream, threshold=2.0)
latest_anomaly = result.iloc[-1]['anomaly']
if latest_anomaly:
print(f"🚨 ANOMALY at step {i}: {new_point:.2f}")
else:
print(f"✅ Normal at step {i}: {new_point:.2f}")
time.sleep(0.1) # Simulate real-time delay
streaming_anomaly_demo()# Extend to 2D data
def multivariate_zscore(data_2d, threshold=2.0):
"""
2D anomaly detection using Mahalanobis distance
"""
import scipy.spatial.distance as distance
# Calculate Mahalanobis distance for each point
mean = np.mean(data_2d, axis=0)
cov = np.cov(data_2d.T)
distances = []
for point in data_2d:
dist = distance.mahalanobis(point, mean, np.linalg.inv(cov))
distances.append(dist)
distances = np.array(distances)
threshold_val = np.percentile(distances, 95) # Top 5% as anomalies
return distances > threshold_val
# Test with 2D data
data_2d = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 1]], 100)
anomalies_2d = multivariate_zscore(data_2d)
print(f"2D anomalies detected: {np.sum(anomalies_2d)}")- Algorithm Comparison: Implement IQR-based detection and compare
- Parameter Tuning: Find optimal thresholds for different data types
- Integration Testing: Use with real datasets from other projects
- Performance Testing: Benchmark with large datasets
An interactive Streamlit web application that demonstrates how PlotSenseAI can be integrated into web applications for intuitive data exploration and storytelling.
cd project_three
pip install -r requirements.txt
streamlit run app.pyBrowser opens to http://localhost:8501
Main Components:
- Sidebar: Controls and configuration
- Main Area: Data display and visualizations
- Status Bar: Real-time feedback
Initial State: App loads with Chicago temperature data displayed
The climate dataset contains:
- Cities: Chicago, New York, Phoenix, Los Angeles
- Variables: Temperature, Humidity, Wind Speed, Rainfall
- Time Range: Full year of daily data (2023)
- Format: Clean, structured CSV data
Quick Exercise: Check "Show raw data" in sidebar to see the data structure.
- Current: Chicago (default)
- Change to: New York
- Observe: Data updates automatically
- Notice: PlotSenseAI recommendations change based on new data patterns
- Current: Temperature
- Change to: Humidity
- Observe: Recommendations adapt to different data distribution
- Key Insight: Different variables → different optimal visualizations
- Current: 3 suggestions
- Change slider: 5 suggestions
- Observe: More visualization options appear
- Try: Different recommendation indices
When you change cities, watch the recommendations table:
Chicago Temperature might show:
Index Recommendation Confidence Chart_Type
0 Time series line plot 0.94 line
1 Distribution histogram 0.87 histogram
2 Seasonal decomposition 0.82 seasonal
Phoenix Humidity might show:
Index Recommendation Confidence Chart_Type
0 Box plot by month 0.91 boxplot
1 Scatter vs temperature 0.85 scatter
2 Violin plot seasonal 0.79 violin
Key Observation: PlotSenseAI adapts recommendations to:
- Data distribution characteristics
- Variable types and ranges
- Temporal patterns
- Correlation structures
- Select recommendation: Choose index from dropdown
- Auto-generation: Plot appears instantly
- Professional quality: Clean styling, proper labels
- Interactive elements: Hover, zoom, pan (depending on plot type)
- Get key: Visit Groq Console
- Format:
gsk_xxxxxxxxxxxxxxxxxxxxx - Enter: In sidebar password field
- Test: Generate a visualization
Sample explanation for a temperature time series:
"This time series visualization of Chicago temperature data reveals clear seasonal patterns typical of continental climate zones. The data shows:
🌡️ Temperature Range: 15-85°F across the year
📈 Seasonal Trends: Clear winter lows (Jan-Feb) and summer highs (Jul-Aug)
📊 Variability: Higher day-to-day variation in spring/fall transition periods
🎯 Key Insights: The data suggests typical Midwest weather patterns with distinct seasonal cycles"
What to look for:
- Data interpretation: What the numbers mean
- Pattern recognition: Trends and anomalies identified
- Context: Real-world implications
- Actionable insights: What the patterns suggest
- Start: Chicago, Temperature
- Note: Visualization characteristics
- Switch: Phoenix, Temperature
- Compare: How do patterns differ?
- Insight: Desert vs. continental climate patterns
- Setup: Same city, different variables
- Example: Los Angeles
- Temperature: Mild variations
- Humidity: Inverse correlation with temperature
- Rainfall: Sparse, seasonal clusters
- Wind Speed: Consistent patterns
- Observation: Look for seasonal trends
- Comparison: Compare similar months across variables
- Correlation: Notice relationships between variables
- Desktop: Full sidebar layout
- Mobile: Collapsible sidebar
- Tablet: Optimized spacing
Test: Resize browser window to see adaptive layout
- Data filtering: Instant response to city changes
- Visualization refresh: Automatic plot updates
- Recommendation adaptation: Dynamic suggestion updates
Try these edge cases:
- Empty API key → Graceful degradation
- Network issues → Appropriate error messages
- Invalid selections → Auto-correction
Notice:
- Caching: Data loads only once (
@st.cache_data) - Lazy loading: Explanations only when API key provided
- Efficient updates: Only changed components re-render
Replace climate.csv with your own dataset:
# Required columns: Date, Category, Numeric_Variable
# Example: sales.csv with Date, Region, RevenueAdd new sidebar controls:
# Date range picker
date_range = st.sidebar.date_input("Select Date Range")
# Multiple city selection
cities = st.sidebar.multiselect("Select Cities", df["City"].unique())
# Custom thresholds
threshold = st.sidebar.slider("Anomaly Threshold", 1.0, 3.0, 2.0)# Add plot customization options
plot_style = st.sidebar.selectbox("Plot Style", ["default", "dark", "minimal"])
color_scheme = st.sidebar.color_picker("Choose Color")# Combine with anomaly detection
from plotsense_anomaly import zscore_anomaly_detection
# Add anomaly detection toggle
if st.sidebar.checkbox("Detect Anomalies"):
anomalies = zscore_anomaly_detection(filtered_data[variable])
st.subheader("Anomaly Detection Results")
st.write(f"Anomalies detected: {anomalies['anomaly'].sum()}")- Update data: Add columns to CSV
- Update UI: Add to selectbox options
- Test: Verify PlotSenseAI handles new data types
# Add custom CSS
st.markdown("""
<style>
.main-header {
color: #1f77b4;
font-size: 2rem;
}
</style>
""", unsafe_allow_html=True)# Custom plot function
def custom_plot_type(data, variable):
# Your custom visualization logic
fig, ax = plt.subplots()
# ... plotting code ...
return figAfter completing these walkthroughs, you've:
✅ Mastered PlotSenseAI Basics: Recommendations, generation, explanations ✅ Built Custom Extensions: Created anomaly detection plugin ✅ Developed Web Applications: Interactive data storytelling app ✅ Understood Integration Patterns: How to combine PlotSenseAI with other tools ✅ Explored Real-world Applications: Practical use cases and implementations
- Choose Your Path: Pick the demo that aligns with your interests
- Customize and Extend: Add your own features and improvements
- Combine Projects: Create hybrid applications using multiple demos
- Document Your Journey: Create your own walkthrough for your modifications
- Workshop Planning: Use these walkthroughs as guided workshop content
- Assessment: Check participant understanding at key checkpoints
- Troubleshooting: Reference common issues and solutions provided
- Extension Activities: Use experiment suggestions for advanced participants
- Modify color schemes in visualizations
- Add new cities to the climate dataset
- Change anomaly detection thresholds
- Integrate all three demos into one application
- Add real-time data streaming
- Implement user authentication and data persistence
- Create new PlotSenseAI plugin types
- Build mobile-responsive designs
- Add machine learning model comparison features
- PlotSenseAI Documentation: docs.plotsense.ai
- Streamlit Gallery: streamlit.io/gallery
- Jupyter Best Practices: jupyter-notebook.readthedocs.io
- Data Visualization Principles: Visual design principles for data viz
Happy Exploring! 🎉
Remember: The best way to learn is by doing. Don't hesitate to break things, experiment, and most importantly, have fun with your data!