diff --git a/anomaly_detection_system_diagrams.md b/anomaly_detection_system_diagrams.md
new file mode 100644
index 0000000000..c649f4f6d8
--- /dev/null
+++ b/anomaly_detection_system_diagrams.md
@@ -0,0 +1,488 @@
+# Machine Learning Anomaly Detection System - Architecture Diagrams
+
+## 1. Main System Architecture
+
+```mermaid
+graph TB
+    %% Data Input Layer
+    A[Real-time Metrics<br/>15 KPIs @ 1min intervals] --> B[Feature Extraction<br/>15-dimensional vectors]
+    B --> C[Data Normalization<br/>StandardScaler]
+    C --> D[Isolation Forest Model<br/>100 estimators, 10% contamination]
+    
+    %% Detection Pipeline
+    D --> E[Anomaly Detection<br/>Score: -1 to 1]
+    E --> F[Statistical Analysis<br/>2σ threshold detection]
+    F --> G[Severity Assessment<br/>Critical/High/Medium/Low]
+    G --> H[LLM Analysis<br/>Gemini AI Integration]
+    H --> I[Actionable Reports<br/>Root cause + Recommendations]
+    
+    %% Training Pipeline
+    subgraph "Training Phase"
+        J[Historical Data<br/>120 samples, 2 hours] --> K[Feature Matrix<br/>120×15 dimensions]
+        K --> L[StandardScaler Fitting<br/>μ and σ calculation]
+        L --> M[Isolation Forest Training<br/>Normal pattern learning]
+    end
+    
+    %% Model Storage
+    M -.-> D
+    L -.-> C
+    
+    %% Feedback Loop
+    I --> N[Model Performance Monitoring]
+    N --> O{Retrain Needed?}
+    O -->|Yes| J
+    O -->|No| A
+    
+    %% Styling
+    classDef inputLayer fill:#e1f5fe
+    classDef processLayer fill:#f3e5f5
+    classDef mlLayer fill:#e8f5e8
+    classDef outputLayer fill:#fff3e0
+    classDef trainingLayer fill:#fce4ec
+    
+    class A,J inputLayer
+    class B,C,F,G processLayer
+    class D,E,K,L,M mlLayer
+    class H,I,N outputLayer
+    class J,K,L,M trainingLayer
+```
+
+## 2. ML Pipeline Architecture
+
+```mermaid
+graph LR
+    %% Input Processing
+    subgraph "Data Ingestion"
+        A1[Raw Metrics<br/>JSON Format]
+        A2[Time Series Data<br/>1-minute intervals]
+        A3[Service Health<br/>Indicators]
+    end
+    
+    %% Feature Engineering
+    subgraph "Feature Engineering"
+        B1[Performance Metrics<br/>latency_p50/95/99/mean]
+        B2[Error Metrics<br/>error_rate, error_count]
+        B3[Resource Metrics<br/>cpu_usage, memory_usage]
+        B4[Connection Metrics<br/>active_connections, wait_time]
+        B5[Application Metrics<br/>request_rate, cosmos_ops]
+        B6[Database Metrics<br/>query_time, connection_errors]
+        B7[System Metrics<br/>queue_depth]
+    end
+    
+    %% ML Processing
+    subgraph "ML Pipeline"
+        C1[Feature Vector<br/>15 dimensions]
+        C2[StandardScaler<br/>Normalization]
+        C3[Isolation Forest<br/>Anomaly Detection]
+        C4[Decision Function<br/>Anomaly Scoring]
+    end
+    
+    %% Analysis Layer
+    subgraph "Analysis Engine"
+        D1[Statistical Threshold<br/>2σ Analysis]
+        D2[Affected Metrics<br/>Identification]
+        D3[Severity Calculator<br/>Multi-factor Assessment]
+        D4[Confidence Score<br/>Calculation]
+    end
+    
+    %% AI Integration
+    subgraph "LLM Integration"
+        E1[Context Preparation<br/>Metric History + Anomaly]
+        E2[Gemini AI Analysis<br/>Root Cause Detection]
+        E3[Recommendation Engine<br/>Actionable Insights]
+        E4[Impact Assessment<br/>Business Impact]
+    end
+    
+    %% Data Flow
+    A1 --> B1
+    A2 --> B2
+    A3 --> B3
+    A1 --> B4
+    A2 --> B5
+    A3 --> B6
+    A1 --> B7
+    
+    B1 --> C1
+    B2 --> C1
+    B3 --> C1
+    B4 --> C1
+    B5 --> C1
+    B6 --> C1
+    B7 --> C1
+    
+    C1 --> C2
+    C2 --> C3
+    C3 --> C4
+    
+    C4 --> D1
+    C4 --> D2
+    D1 --> D3
+    D2 --> D3
+    D3 --> D4
+    
+    D4 --> E1
+    E1 --> E2
+    E2 --> E3
+    E2 --> E4
+    
+    %% Styling
+    classDef dataLayer fill:#e3f2fd
+    classDef featureLayer fill:#f1f8e9
+    classDef mlLayer fill:#fff8e1
+    classDef analysisLayer fill:#fce4ec
+    classDef aiLayer fill:#f3e5f5
+    
+    class A1,A2,A3 dataLayer
+    class B1,B2,B3,B4,B5,B6,B7 featureLayer
+    class C1,C2,C3,C4 mlLayer
+    class D1,D2,D3,D4 analysisLayer
+    class E1,E2,E3,E4 aiLayer
+```
+
+## 3. Real-time Detection Flow
+
+```mermaid
+sequenceDiagram
+    participant M as Metrics Collector
+    participant FE as Feature Extractor
+    participant SC as StandardScaler
+    participant IF as Isolation Forest
+    participant SA as Statistical Analyzer
+    participant SV as Severity Assessor
+    participant LLM as Gemini AI
+    participant AR as Alert Router
+    participant DH as Dashboard
+    
+    Note over M,DH: Real-time Anomaly Detection Flow
+    
+    M->>FE: Raw metrics (JSON)
+    Note right of M: 15 KPIs every minute
+    
+    FE->>SC: Feature vector [15 dims]
+    Note right of FE: Extract performance,<br/>error, resource metrics
+    
+    SC->>IF: Normalized features
+    Note right of SC: Apply training<br/>μ and σ values
+    
+    IF->>SA: Anomaly score + prediction
+    Note right of IF: Score: -1 (anomaly)<br/>to +1 (normal)
+    
+    SA->>SA: Identify affected metrics
+    Note right of SA: Compare with 2σ<br/>threshold per metric
+    
+    SA->>SV: Affected metrics list
+    SV->>SV: Calculate severity
+    Note right of SV: Critical/High/Medium/Low<br/>based on score + metrics
+    
+    alt Anomaly Detected
+        SV->>LLM: Anomaly context + history
+        Note right of SV: Include metric trends<br/>and service context
+        
+        LLM->>LLM: Analyze root cause
+        Note right of LLM: Generate insights,<br/>recommendations, impact
+        
+        LLM->>AR: Analysis report
+        Note right of LLM: Root cause +<br/>actionable steps
+        
+        AR->>DH: Alert + recommendations
+        Note right of AR: Route to appropriate<br/>teams and systems
+    else Normal Operation
+        SV->>DH: Status: Normal
+        Note right of SV: Update health<br/>dashboard only
+    end
+    
+    Note over M,DH: Total latency: < 2 minutes
+```
+
+## 4. Component Interaction Architecture
+
+```mermaid
+graph TB
+    %% External Systems
+    subgraph "External Systems"
+        EXT1[Service Metrics<br/>Prometheus/Grafana]
+        EXT2[Application Logs<br/>ELK Stack]
+        EXT3[Infrastructure<br/>Monitoring]
+    end
+    
+    %% Core Detection System
+    subgraph "Anomaly Detection Core"
+        CORE1[Metrics Ingestion<br/>API Gateway]
+        CORE2[Feature Store<br/>Time Series DB]
+        CORE3[ML Model Registry<br/>Trained Models]
+        CORE4[Detection Engine<br/>Real-time Processing]
+        CORE5[Analysis Engine<br/>Statistical + AI]
+    end
+    
+    %% AI/LLM Layer
+    subgraph "AI Analysis Layer"
+        AI1[Context Builder<br/>Metric History + Metadata]
+        AI2[Gemini AI API<br/>Root Cause Analysis]
+        AI3[Insight Generator<br/>Recommendations Engine]
+        AI4[Impact Assessor<br/>Business Impact Calculator]
+    end
+    
+    %% Output Systems
+    subgraph "Output & Integration"
+        OUT1[Alert Manager<br/>Multi-channel Notifications]
+        OUT2[Dashboard<br/>Real-time Visualization]
+        OUT3[Incident Management<br/>JIRA/ServiceNow]
+        OUT4[Automated Actions<br/>Self-healing Triggers]
+    end
+    
+    %% Storage Layer
+    subgraph "Data Storage"
+        DB1[(Training Data<br/>Historical Metrics)]
+        DB2[(Model Artifacts<br/>Scalers + Models)]
+        DB3[(Analysis History<br/>Past Incidents)]
+        DB4[(Configuration<br/>Thresholds + Rules)]
+    end
+    
+    %% Data Flow
+    EXT1 --> CORE1
+    EXT2 --> CORE1
+    EXT3 --> CORE1
+    
+    CORE1 --> CORE2
+    CORE2 --> CORE4
+    CORE3 --> CORE4
+    CORE4 --> CORE5
+    
+    CORE5 --> AI1
+    AI1 --> AI2
+    AI2 --> AI3
+    AI2 --> AI4
+    
+    AI3 --> OUT1
+    AI4 --> OUT1
+    CORE5 --> OUT2
+    OUT1 --> OUT3
+    AI3 --> OUT4
+    
+    %% Storage Connections
+    CORE2 --> DB1
+    CORE3 --> DB2
+    AI3 --> DB3
+    CORE5 --> DB4
+    
+    %% Feedback Loops
+    OUT2 -.->|Model Performance| CORE3
+    OUT3 -.->|Incident Feedback| DB3
+    DB3 -.->|Learning| CORE3
+    
+    %% Styling
+    classDef external fill:#ffebee
+    classDef core fill:#e8f5e8
+    classDef ai fill:#f3e5f5
+    classDef output fill:#fff3e0
+    classDef storage fill:#e1f5fe
+    
+    class EXT1,EXT2,EXT3 external
+    class CORE1,CORE2,CORE3,CORE4,CORE5 core
+    class AI1,AI2,AI3,AI4 ai
+    class OUT1,OUT2,OUT3,OUT4 output
+    class DB1,DB2,DB3,DB4 storage
+```
+
+## 5. Data Structure and Metrics Flow
+
+```mermaid
+graph TD
+    %% Input Data Structure
+    subgraph "Input Metrics Structure"
+        INPUT["{<br/>timestamp: '2025-10-14T01:00:00Z',<br/>latency_p50: 0.1,<br/>latency_p95: 0.5,<br/>latency_p99: 1.0,<br/>cpu_usage: 45.0,<br/>error_rate: 0.01,<br/>active_connections: 50,<br/>request_rate: 100,<br/>...<br/>}"]
+    end
+    
+    %% Metrics Categories
+    subgraph "Metrics Categories"
+        CAT1[Performance Metrics<br/>• latency_p50<br/>• latency_p95<br/>• latency_p99<br/>• latency_mean]
+        CAT2[Error Metrics<br/>• error_rate<br/>• error_count]
+        CAT3[Resource Metrics<br/>• cpu_usage<br/>• memory_usage]
+        CAT4[Connection Metrics<br/>• active_connections<br/>• connection_wait_time]
+        CAT5[Application Metrics<br/>• request_rate<br/>• cosmos_client_ops]
+        CAT6[Database Metrics<br/>• db_query_time<br/>• db_connection_errors]
+        CAT7[System Metrics<br/>• queue_depth]
+    end
+    
+    %% Feature Matrix
+    subgraph "Feature Matrix Construction"
+        MATRIX["Training Matrix<br/>120 samples × 15 features<br/><br/>Sample 1: [0.1, 0.5, 1.0, 0.4, 0.01, ...]<br/>Sample 2: [0.12, 0.52, 1.1, 0.41, 0.009, ...]<br/>...<br/>Sample 120: [0.09, 0.48, 0.95, 0.38, 0.012, ...]"]
+    end
+    
+    %% Normalization Process
+    subgraph "Normalization Process"
+        NORM1[Calculate Statistics<br/>μ = mean(feature)<br/>σ = std(feature)]
+        NORM2[Apply Transformation<br/>normalized = (value - μ) / σ]
+        NORM3[Scaled Feature Matrix<br/>All features: μ=0, σ=1]
+    end
+    
+    %% Model Training
+    subgraph "Model Training Data"
+        TRAIN1[Training Configuration<br/>• Volume: 120 data points<br/>• Duration: 2 hours<br/>• Frequency: 1-minute intervals<br/>• Type: Normal patterns only]
+        TRAIN2[Isolation Forest Setup<br/>• Estimators: 100 trees<br/>• Contamination: 10%<br/>• Random State: 42]
+    end
+    
+    %% Real-time Processing
+    subgraph "Real-time Processing"
+        RT1[New Metric Point<br/>15-dimensional vector]
+        RT2[Apply Saved Scaler<br/>Use training μ and σ]
+        RT3[Model Prediction<br/>Score + Classification]
+        RT4[Statistical Analysis<br/>Identify affected metrics]
+    end
+    
+    %% Output Structure
+    subgraph "Analysis Output"
+        OUTPUT["{<br/>anomaly_detected: true,<br/>isolation_score: -0.6,<br/>severity: 'Critical',<br/>confidence: 0.85,<br/>affected_metrics: [<br/>  'latency_p99',<br/>  'db_query_time',<br/>  'cpu_usage'<br/>],<br/>llm_analysis: {<br/>  root_cause: '...',<br/>  recommendations: [...],<br/>  impact: '...'<br/>}<br/>}"]
+    end
+    
+    %% Data Flow
+    INPUT --> CAT1
+    INPUT --> CAT2
+    INPUT --> CAT3
+    INPUT --> CAT4
+    INPUT --> CAT5
+    INPUT --> CAT6
+    INPUT --> CAT7
+    
+    CAT1 --> MATRIX
+    CAT2 --> MATRIX
+    CAT3 --> MATRIX
+    CAT4 --> MATRIX
+    CAT5 --> MATRIX
+    CAT6 --> MATRIX
+    CAT7 --> MATRIX
+    
+    MATRIX --> NORM1
+    NORM1 --> NORM2
+    NORM2 --> NORM3
+    
+    NORM3 --> TRAIN1
+    TRAIN1 --> TRAIN2
+    
+    %% Real-time flow
+    INPUT --> RT1
+    RT1 --> RT2
+    NORM1 -.->|Saved Parameters| RT2
+    RT2 --> RT3
+    TRAIN2 -.->|Trained Model| RT3
+    RT3 --> RT4
+    RT4 --> OUTPUT
+    
+    %% Styling
+    classDef input fill:#e3f2fd
+    classDef category fill:#f1f8e9
+    classDef process fill:#fff8e1
+    classDef training fill:#fce4ec
+    classDef realtime fill:#f3e5f5
+    classDef output fill:#fff3e0
+    
+    class INPUT input
+    class CAT1,CAT2,CAT3,CAT4,CAT5,CAT6,CAT7 category
+    class MATRIX,NORM1,NORM2,NORM3 process
+    class TRAIN1,TRAIN2 training
+    class RT1,RT2,RT3,RT4 realtime
+    class OUTPUT output
+```
+
+## 6. Performance and Monitoring Dashboard
+
+```mermaid
+graph TB
+    %% Performance Metrics
+    subgraph "System Performance"
+        PERF1[Detection Latency<br/>< 2 minutes<br/>Target: Real-time]
+        PERF2[Training Time<br/>~30 seconds<br/>For 2 hours data]
+        PERF3[Memory Usage<br/>~50MB per model<br/>Scalable architecture]
+        PERF4[Accuracy Rate<br/>~85% detection<br/>15% false positives]
+    end
+    
+    %% Model Health
+    subgraph "Model Health Monitoring"
+        HEALTH1[Prediction Accuracy<br/>Track true/false positives]
+        HEALTH2[Model Drift Detection<br/>Performance degradation]
+        HEALTH3[Feature Importance<br/>Metric contribution analysis]
+        HEALTH4[Retraining Triggers<br/>Weekly/monthly updates]
+    end
+    
+    %% Operational Metrics
+    subgraph "Operational Dashboard"
+        OPS1[Active Alerts<br/>Current anomalies]
+        OPS2[Service Health<br/>Per-service status]
+        OPS3[Trend Analysis<br/>Historical patterns]
+        OPS4[Alert Resolution<br/>MTTR tracking]
+    end
+    
+    %% Integration Status
+    subgraph "Integration Health"
+        INT1[Data Pipeline<br/>Metrics ingestion status]
+        INT2[LLM API Status<br/>Gemini AI availability]
+        INT3[Alert Channels<br/>Notification delivery]
+        INT4[Storage Health<br/>Database performance]
+    end
+    
+    %% Feedback Loop
+    subgraph "Continuous Improvement"
+        FEED1[Incident Feedback<br/>Post-incident analysis]
+        FEED2[Model Updates<br/>Retrain with new data]
+        FEED3[Threshold Tuning<br/>Reduce false positives]
+        FEED4[Feature Engineering<br/>Add new metrics]
+    end
+    
+    %% Connections
+    PERF1 --> HEALTH1
+    PERF4 --> HEALTH2
+    HEALTH2 --> FEED2
+    HEALTH1 --> FEED3
+    
+    OPS1 --> INT3
+    OPS4 --> FEED1
+    FEED1 --> FEED2
+    
+    INT1 --> PERF1
+    INT2 --> OPS1
+    
+    FEED3 --> HEALTH1
+    FEED4 --> HEALTH3
+    
+    %% Styling
+    classDef performance fill:#e8f5e8
+    classDef health fill:#fff3e0
+    classDef operations fill:#f3e5f5
+    classDef integration fill:#e1f5fe
+    classDef feedback fill:#fce4ec
+    
+    class PERF1,PERF2,PERF3,PERF4 performance
+    class HEALTH1,HEALTH2,HEALTH3,HEALTH4 health
+    class OPS1,OPS2,OPS3,OPS4 operations
+    class INT1,INT2,INT3,INT4 integration
+    class FEED1,FEED2,FEED3,FEED4 feedback
+```
+
+## Key Features Highlighted in Diagrams:
+
+### 1. **Comprehensive Data Flow**
+- 15 KPIs processed every minute
+- Real-time feature extraction and normalization
+- ML-based anomaly detection with statistical validation
+
+### 2. **Advanced ML Pipeline**
+- Isolation Forest with 100 estimators
+- StandardScaler for feature normalization
+- Multi-dimensional anomaly scoring
+
+### 3. **Intelligent Analysis**
+- Statistical threshold analysis (2σ rule)
+- Severity assessment (Critical/High/Medium/Low)
+- LLM-powered root cause analysis
+
+### 4. **Scalable Architecture**
+- Modular component design
+- Independent service monitoring
+- Automated model retraining
+
+### 5. **Operational Excellence**
+- < 2-minute detection latency
+- 85% accuracy with 15% false positive rate
+- Comprehensive monitoring and feedback loops
+
+These diagrams provide a complete visual representation of your ML-based anomaly detection system, showing both the technical architecture and operational workflows.
\ No newline at end of file