|
| 1 | +# Sosreport Plugin |
| 2 | + |
| 3 | +Automate sosreport analysis for system diagnostics and troubleshooting. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The sosreport plugin provides AI-powered analysis of sosreport archives, which are diagnostic data collections from Linux systems. It automatically examines logs, resource usage, network configuration, and system state to identify issues and provide actionable recommendations. |
| 8 | + |
| 9 | +## What is sosreport? |
| 10 | + |
| 11 | +[sosreport](https://github.com/sosreport/sos) is a diagnostic data collection tool used primarily in Red Hat Enterprise Linux and related distributions. It gathers system configuration, logs, and diagnostic information into a single archive for troubleshooting purposes. |
| 12 | + |
| 13 | +## Commands |
| 14 | + |
| 15 | +### `/sosreport:analyze` |
| 16 | + |
| 17 | +Performs comprehensive analysis of a sosreport archive with support for selective analysis. |
| 18 | + |
| 19 | +**Usage:** |
| 20 | +```bash |
| 21 | +/sosreport:analyze <path-to-sosreport> [--only <areas>] [--skip <areas>] |
| 22 | +``` |
| 23 | + |
| 24 | +**Arguments:** |
| 25 | +- `<path-to-sosreport>`: Path to the sosreport archive (`.tar.gz`, `.tar.xz`) or extracted directory |
| 26 | +- `--only <areas>`: (Optional) Run only specific analysis areas (comma-separated) |
| 27 | +- `--skip <areas>`: (Optional) Skip specific analysis areas (comma-separated) |
| 28 | + |
| 29 | +**Analysis Areas:** |
| 30 | + |
| 31 | +The analysis is organized into four specialized areas, each with detailed implementation guidance: |
| 32 | + |
| 33 | +1. **`logs`** - System and Application Logs Analysis |
| 34 | + - Analyzes journald logs, syslog, dmesg, and application logs |
| 35 | + - Identifies errors, warnings, and critical messages |
| 36 | + - Detects OOM killer events, kernel panics, segfaults |
| 37 | + - Counts and categorizes errors by severity |
| 38 | + - Provides timeline of critical events |
| 39 | + - **Skill**: [`skills/logs-analysis/SKILL.md`](skills/logs-analysis/SKILL.md) |
| 40 | + |
| 41 | +2. **`resources`** - System Resource Usage Analysis |
| 42 | + - Memory usage, swap, and pressure indicators |
| 43 | + - CPU information and load averages |
| 44 | + - Disk usage and filesystem capacity |
| 45 | + - Process analysis (top consumers, zombies) |
| 46 | + - Resource exhaustion patterns |
| 47 | + - **Skill**: [`skills/resource-analysis/SKILL.md`](skills/resource-analysis/SKILL.md) |
| 48 | + |
| 49 | +3. **`network`** - Network Configuration and Connectivity |
| 50 | + - Network interface status and IP addresses |
| 51 | + - Routing table and default gateway |
| 52 | + - Active connections and listening services |
| 53 | + - Firewall rules (firewalld/iptables/nftables) |
| 54 | + - DNS configuration and hostname resolution |
| 55 | + - Network error detection |
| 56 | + - **Skill**: [`skills/network-analysis/SKILL.md`](skills/network-analysis/SKILL.md) |
| 57 | + |
| 58 | +4. **`system-config`** - System Configuration and Security |
| 59 | + - OS version and kernel information |
| 60 | + - Installed package versions |
| 61 | + - Systemd service status and failures |
| 62 | + - SELinux/AppArmor configuration and denials |
| 63 | + - Kernel parameters and resource limits |
| 64 | + - **Skill**: [`skills/system-config-analysis/SKILL.md`](skills/system-config-analysis/SKILL.md) |
| 65 | + |
| 66 | +**Output:** |
| 67 | +- Interactive summary categorized by severity (Critical, High, Medium, Low) |
| 68 | +- Resource utilization metrics (when `resources` is selected) |
| 69 | +- Top errors and their frequency (when `logs` is selected) |
| 70 | +- Failed services (when `system-config` is selected) |
| 71 | +- Network configuration status (when `network` is selected) |
| 72 | +- Actionable recommendations |
| 73 | +- File paths for detailed investigation |
| 74 | + |
| 75 | +**Examples:** |
| 76 | + |
| 77 | +```bash |
| 78 | +# Comprehensive analysis (all areas) |
| 79 | +/sosreport:analyze /tmp/sosreport-server01-2024-01-15.tar.xz |
| 80 | + |
| 81 | +# Analyze only logs and network |
| 82 | +/sosreport:analyze /tmp/sosreport.tar.xz --only logs,network |
| 83 | + |
| 84 | +# Skip resource analysis |
| 85 | +/sosreport:analyze /tmp/sosreport.tar.xz --skip resources |
| 86 | + |
| 87 | +# Quick log-only analysis |
| 88 | +/sosreport:analyze /tmp/sosreport.tar.xz --only logs |
| 89 | + |
| 90 | +# Analyze extracted directory |
| 91 | +/sosreport:analyze /tmp/sosreport-server01-2024-01-15/ --only system-config |
| 92 | +``` |
| 93 | + |
| 94 | +The command automatically extracts compressed archives to `.work/sosreport-analyze/` and performs the selected analysis. |
| 95 | + |
| 96 | +## Analysis Skills |
| 97 | + |
| 98 | +The sosreport plugin uses specialized analysis skills for each area. Each skill contains detailed implementation guidance with bash commands, parsing logic, error handling, and output formats. |
| 99 | + |
| 100 | +| Skill | Description | Documentation | |
| 101 | +|-------|-------------|---------------| |
| 102 | +| **Logs Analysis** | Analyzes system logs, journald, dmesg, and application logs. Identifies errors, OOM events, kernel panics, and segfaults. | [`skills/logs-analysis/SKILL.md`](skills/logs-analysis/SKILL.md) | |
| 103 | +| **Resource Analysis** | Analyzes memory, CPU, disk usage, and processes. Identifies resource exhaustion and performance bottlenecks. | [`skills/resource-analysis/SKILL.md`](skills/resource-analysis/SKILL.md) | |
| 104 | +| **Network Analysis** | Analyzes network interfaces, routing, connections, firewall rules, and DNS configuration. | [`skills/network-analysis/SKILL.md`](skills/network-analysis/SKILL.md) | |
| 105 | +| **System Config Analysis** | Analyzes OS info, packages, systemd services, SELinux/AppArmor, and kernel parameters. | [`skills/system-config-analysis/SKILL.md`](skills/system-config-analysis/SKILL.md) | |
| 106 | + |
| 107 | +Each skill document includes: |
| 108 | +- Step-by-step implementation instructions |
| 109 | +- Bash command examples with actual sosreport file paths |
| 110 | +- Error handling guidance |
| 111 | +- Output format templates |
| 112 | +- Common patterns and severity classifications |
| 113 | +- Tips for effective analysis |
| 114 | + |
| 115 | +## Installation |
| 116 | + |
| 117 | +### From Marketplace |
| 118 | + |
| 119 | +```bash |
| 120 | +# Add the ai-helpers marketplace |
| 121 | +/plugin marketplace add openshift-eng/ai-helpers |
| 122 | + |
| 123 | +# Install the sosreport plugin |
| 124 | +/plugin install sosreport@ai-helpers |
| 125 | +``` |
| 126 | + |
| 127 | +### Manual Installation |
| 128 | + |
| 129 | +```bash |
| 130 | +# Clone the repository |
| 131 | +git clone https://github.com/openshift-eng/ai-helpers.git |
| 132 | + |
| 133 | +# Add the ai-helpers marketplace from cloned directory |
| 134 | +/plugin marketplace add $(pwd)/ai-helpers |
| 135 | + |
| 136 | +# Install the sosreport plugin |
| 137 | +/plugin install sosreport@ai-helpers |
| 138 | +``` |
| 139 | + |
| 140 | +## Typical Workflows |
| 141 | + |
| 142 | +### Full Comprehensive Analysis |
| 143 | + |
| 144 | +1. **Obtain sosreport**: Get a sosreport archive from a system (usually generated with `sosreport` or `sos report` command) |
| 145 | + |
| 146 | +2. **Run comprehensive analysis**: |
| 147 | + ```bash |
| 148 | + /sosreport:analyze /path/to/sosreport.tar.xz |
| 149 | + ``` |
| 150 | + |
| 151 | +3. **Review findings**: Examine the interactive summary for critical issues and recommendations across all areas |
| 152 | + |
| 153 | +4. **Deep dive**: Ask follow-up questions about specific findings: |
| 154 | + ```bash |
| 155 | + Can you show me more details about the OOM killer events? |
| 156 | + What caused the httpd service to fail? |
| 157 | + ``` |
| 158 | + |
| 159 | +5. **Take action**: Use the recommendations to troubleshoot and resolve issues |
| 160 | + |
| 161 | +### Targeted Investigation Workflow |
| 162 | + |
| 163 | +1. **Quick log scan** (fastest): |
| 164 | + ```bash |
| 165 | + /sosreport:analyze /path/to/sosreport.tar.xz --only logs |
| 166 | + ``` |
| 167 | + Quickly identify error patterns and critical events |
| 168 | + |
| 169 | +2. **Follow-up based on findings**: |
| 170 | + - If memory issues found: Run `--only resources` |
| 171 | + - If network errors found: Run `--only network` |
| 172 | + - If service failures found: Run `--only system-config` |
| 173 | + |
| 174 | +3. **Example iterative investigation**: |
| 175 | + ```bash |
| 176 | + # Start with logs to identify the problem area |
| 177 | + /sosreport:analyze /tmp/sos.tar.xz --only logs |
| 178 | + |
| 179 | + # Found network timeouts, analyze network configuration |
| 180 | + /sosreport:analyze /tmp/sos.tar.xz --only network |
| 181 | + |
| 182 | + # Network looks fine, check if it's a resource issue |
| 183 | + /sosreport:analyze /tmp/sos.tar.xz --only resources |
| 184 | + ``` |
| 185 | + |
| 186 | +### Performance-Focused Workflow |
| 187 | + |
| 188 | +When you know what you're looking for: |
| 189 | + |
| 190 | +```bash |
| 191 | +# Only interested in service configuration |
| 192 | +/sosreport:analyze /path/to/sos.tar.xz --only system-config |
| 193 | + |
| 194 | +# Need logs and network, skip the rest |
| 195 | +/sosreport:analyze /path/to/sos.tar.xz --only logs,network |
| 196 | + |
| 197 | +# Full analysis but skip time-consuming resource analysis |
| 198 | +/sosreport:analyze /path/to/sos.tar.xz --skip resources |
| 199 | +``` |
| 200 | + |
| 201 | +## Use Cases |
| 202 | + |
| 203 | +- **Incident response**: Quickly identify root causes of system failures |
| 204 | + - Start with `--only logs` for fastest initial assessment |
| 205 | + - Follow up with targeted analysis based on log findings |
| 206 | + |
| 207 | +- **Performance troubleshooting**: Find resource bottlenecks and optimization opportunities |
| 208 | + - Use `--only resources` to focus on memory, CPU, and disk metrics |
| 209 | + - Combine with `--only logs` to correlate resource issues with errors |
| 210 | + |
| 211 | +- **Configuration review**: Verify system configuration and identify misconfigurations |
| 212 | + - Use `--only system-config` to audit packages, services, and security settings |
| 213 | + - Add `--only network` to validate network configuration |
| 214 | + |
| 215 | +- **Network troubleshooting**: Diagnose connectivity and firewall issues |
| 216 | + - Use `--only network,logs` to see network config and related errors |
| 217 | + - Skip resource-intensive analysis for faster results |
| 218 | + |
| 219 | +- **Proactive monitoring**: Regular analysis of production system sosreports |
| 220 | + - Run comprehensive analysis for periodic health checks |
| 221 | + - Use selective analysis for quick daily checks |
| 222 | + |
| 223 | +- **Knowledge transfer**: Let AI explain complex system issues to team members |
| 224 | + - Use selective analysis to focus on specific areas for learning |
| 225 | + - Each skill provides detailed documentation for understanding |
| 226 | + |
| 227 | +## Prerequisites |
| 228 | + |
| 229 | +- **tar**: For extracting compressed archives (usually pre-installed) |
| 230 | +- **Disk space**: At least 2x the size of the compressed sosreport |
| 231 | + |
| 232 | +## Tips |
| 233 | + |
| 234 | +- **Selective analysis**: Use `--only` or `--skip` to run specific analysis areas for faster results |
| 235 | + - `--only logs` is the fastest option for initial investigation |
| 236 | + - Combine multiple areas: `--only logs,network` |
| 237 | + - Valid areas: `logs`, `resources`, `network`, `system-config` |
| 238 | + |
| 239 | +- **Archive handling**: The command works with both compressed archives (`.tar.gz`, `.tar.xz`) and extracted directories |
| 240 | + |
| 241 | +- **Performance**: For large sosreports (>1GB) |
| 242 | + - Comprehensive analysis may take several minutes |
| 243 | + - Use selective analysis to reduce analysis time |
| 244 | + - Start with `--only logs` then add more areas as needed |
| 245 | + |
| 246 | +- **Interactive investigation**: You can ask follow-up questions to drill deeper into specific findings |
| 247 | + - "Show me more details about the OOM killer events" |
| 248 | + - "What caused the httpd service to fail?" |
| 249 | + - "Analyze the network timeouts in more detail" |
| 250 | + |
| 251 | +- **Workspace**: The extracted sosreport is preserved in `.work/sosreport-analyze/` for manual investigation |
| 252 | + |
| 253 | +- **Skills documentation**: Each analysis area has detailed implementation guidance |
| 254 | + - See `skills/logs-analysis/SKILL.md` for log analysis details |
| 255 | + - See `skills/resource-analysis/SKILL.md` for resource analysis details |
| 256 | + - See `skills/network-analysis/SKILL.md` for network analysis details |
| 257 | + - See `skills/system-config-analysis/SKILL.md` for system config details |
| 258 | + |
| 259 | +## Contributing |
| 260 | + |
| 261 | +See the main [CLAUDE.md](../../CLAUDE.md) guide for information on contributing to this plugin. |
| 262 | + |
| 263 | +## Resources |
| 264 | + |
| 265 | +- [sosreport GitHub](https://github.com/sosreport/sos) |
| 266 | +- [Red Hat sosreport guide](https://access.redhat.com/solutions/3592) |
| 267 | +- [AI Helpers Repository](https://github.com/openshift-eng/ai-helpers) |
0 commit comments