Skip to content

Commit 25ddc94

Browse files
Merge pull request #92 from arkadeepsen/sosreport
Add plugin for analyzing sosreport
2 parents 91eb7dc + 6923e70 commit 25ddc94

File tree

10 files changed

+2525
-0
lines changed

10 files changed

+2525
-0
lines changed

.claude-plugin/marketplace.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,11 @@
3939
"source": "./plugins/session",
4040
"description": "A plugin for Claude session management and persistence"
4141
},
42+
{
43+
"name": "sosreport",
44+
"source": "./plugins/sosreport",
45+
"description": "Analyze sosreport archives for system diagnostics and troubleshooting"
46+
},
4247
{
4348
"name": "utils",
4449
"source": "./plugins/utils",

PLUGINS.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ This document lists all available Claude Code plugins and their commands in the
1616
- [Openshift](#openshift-plugin)
1717
- [Prow Job](#prow-job-plugin)
1818
- [Session](#session-plugin)
19+
- [Sosreport](#sosreport-plugin)
1920
- [Utils](#utils-plugin)
2021
- [Yaml](#yaml-plugin)
2122

@@ -184,6 +185,15 @@ A plugin to save and resume conversation sessions across long time intervals
184185

185186
See [plugins/session/README.md](plugins/session/README.md) for detailed documentation.
186187

188+
### Sosreport Plugin
189+
190+
Analyze sosreport archives for system diagnostics and troubleshooting
191+
192+
**Commands:**
193+
- **`/sosreport:analyze` `<path-to-sosreport> [--only <areas>] [--skip <areas>]`** - Analyze sosreport archive for system diagnostics and issues
194+
195+
See [plugins/sosreport/README.md](plugins/sosreport/README.md) for detailed documentation.
196+
187197
### Utils Plugin
188198

189199
A generic utilities plugin serving as a catch-all for various helper commands and agents

docs/data.json

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,42 @@
289289
"skills": [],
290290
"has_readme": true
291291
},
292+
{
293+
"name": "sosreport",
294+
"description": "Analyze sosreport archives for system diagnostics and troubleshooting",
295+
"version": "0.0.1",
296+
"commands": [
297+
{
298+
"name": "analyze",
299+
"description": "Analyze sosreport archive for system diagnostics and issues",
300+
"synopsis": "/sosreport:analyze <path-to-sosreport> [--only <areas>] [--skip <areas>]",
301+
"argument_hint": "<path-to-sosreport> [--only <areas>] [--skip <areas>]"
302+
}
303+
],
304+
"skills": [
305+
{
306+
"name": "Logs Analysis",
307+
"id": "logs-analysis",
308+
"description": "Analyze system and application log data from sosreport archives, extracting error patterns, kernel panics, OOM events, service failures, and application crashes from journald logs and traditional log files within the sosreport directory structure to identify root causes of system failures and issues"
309+
},
310+
{
311+
"name": "Network Analysis",
312+
"id": "network-analysis",
313+
"description": "Analyze network configuration data from sosreport archives, extracting interface configurations, routing tables, active connections, firewall rules (firewalld/iptables), and DNS settings from the sosreport directory structure to diagnose network connectivity and configuration issues"
314+
},
315+
{
316+
"name": "Resource Analysis",
317+
"id": "resource-analysis",
318+
"description": "Analyze system resource usage data from sosreport archives, extracting memory statistics, CPU load averages, disk space utilization, and process information from the sosreport directory structure to diagnose resource exhaustion, performance bottlenecks, and capacity issues"
319+
},
320+
{
321+
"name": "System Configuration Analysis",
322+
"id": "system-config-analysis",
323+
"description": "Analyze system configuration data from sosreport archives, extracting OS details, installed packages, systemd service status, SELinux/AppArmor policies, and kernel parameters from the sosreport directory structure to diagnose configuration-related system issues"
324+
}
325+
],
326+
"has_readme": true
327+
},
292328
{
293329
"name": "utils",
294330
"description": "A generic utilities plugin serving as a catch-all for various helper commands",
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"name": "sosreport",
3+
"description": "Analyze sosreport archives for system diagnostics and troubleshooting",
4+
"version": "0.0.1",
5+
"author": {
6+
"name": "github.com/arkadeepsen"
7+
}
8+
}

plugins/sosreport/README.md

Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
# Sosreport Plugin
2+
3+
Automate sosreport analysis for system diagnostics and troubleshooting.
4+
5+
## Overview
6+
7+
The sosreport plugin provides AI-powered analysis of sosreport archives, which are diagnostic data collections from Linux systems. It automatically examines logs, resource usage, network configuration, and system state to identify issues and provide actionable recommendations.
8+
9+
## What is sosreport?
10+
11+
[sosreport](https://github.com/sosreport/sos) is a diagnostic data collection tool used primarily in Red Hat Enterprise Linux and related distributions. It gathers system configuration, logs, and diagnostic information into a single archive for troubleshooting purposes.
12+
13+
## Commands
14+
15+
### `/sosreport:analyze`
16+
17+
Performs comprehensive analysis of a sosreport archive with support for selective analysis.
18+
19+
**Usage:**
20+
```bash
21+
/sosreport:analyze <path-to-sosreport> [--only <areas>] [--skip <areas>]
22+
```
23+
24+
**Arguments:**
25+
- `<path-to-sosreport>`: Path to the sosreport archive (`.tar.gz`, `.tar.xz`) or extracted directory
26+
- `--only <areas>`: (Optional) Run only specific analysis areas (comma-separated)
27+
- `--skip <areas>`: (Optional) Skip specific analysis areas (comma-separated)
28+
29+
**Analysis Areas:**
30+
31+
The analysis is organized into four specialized areas, each with detailed implementation guidance:
32+
33+
1. **`logs`** - System and Application Logs Analysis
34+
- Analyzes journald logs, syslog, dmesg, and application logs
35+
- Identifies errors, warnings, and critical messages
36+
- Detects OOM killer events, kernel panics, segfaults
37+
- Counts and categorizes errors by severity
38+
- Provides timeline of critical events
39+
- **Skill**: [`skills/logs-analysis/SKILL.md`](skills/logs-analysis/SKILL.md)
40+
41+
2. **`resources`** - System Resource Usage Analysis
42+
- Memory usage, swap, and pressure indicators
43+
- CPU information and load averages
44+
- Disk usage and filesystem capacity
45+
- Process analysis (top consumers, zombies)
46+
- Resource exhaustion patterns
47+
- **Skill**: [`skills/resource-analysis/SKILL.md`](skills/resource-analysis/SKILL.md)
48+
49+
3. **`network`** - Network Configuration and Connectivity
50+
- Network interface status and IP addresses
51+
- Routing table and default gateway
52+
- Active connections and listening services
53+
- Firewall rules (firewalld/iptables/nftables)
54+
- DNS configuration and hostname resolution
55+
- Network error detection
56+
- **Skill**: [`skills/network-analysis/SKILL.md`](skills/network-analysis/SKILL.md)
57+
58+
4. **`system-config`** - System Configuration and Security
59+
- OS version and kernel information
60+
- Installed package versions
61+
- Systemd service status and failures
62+
- SELinux/AppArmor configuration and denials
63+
- Kernel parameters and resource limits
64+
- **Skill**: [`skills/system-config-analysis/SKILL.md`](skills/system-config-analysis/SKILL.md)
65+
66+
**Output:**
67+
- Interactive summary categorized by severity (Critical, High, Medium, Low)
68+
- Resource utilization metrics (when `resources` is selected)
69+
- Top errors and their frequency (when `logs` is selected)
70+
- Failed services (when `system-config` is selected)
71+
- Network configuration status (when `network` is selected)
72+
- Actionable recommendations
73+
- File paths for detailed investigation
74+
75+
**Examples:**
76+
77+
```bash
78+
# Comprehensive analysis (all areas)
79+
/sosreport:analyze /tmp/sosreport-server01-2024-01-15.tar.xz
80+
81+
# Analyze only logs and network
82+
/sosreport:analyze /tmp/sosreport.tar.xz --only logs,network
83+
84+
# Skip resource analysis
85+
/sosreport:analyze /tmp/sosreport.tar.xz --skip resources
86+
87+
# Quick log-only analysis
88+
/sosreport:analyze /tmp/sosreport.tar.xz --only logs
89+
90+
# Analyze extracted directory
91+
/sosreport:analyze /tmp/sosreport-server01-2024-01-15/ --only system-config
92+
```
93+
94+
The command automatically extracts compressed archives to `.work/sosreport-analyze/` and performs the selected analysis.
95+
96+
## Analysis Skills
97+
98+
The sosreport plugin uses specialized analysis skills for each area. Each skill contains detailed implementation guidance with bash commands, parsing logic, error handling, and output formats.
99+
100+
| Skill | Description | Documentation |
101+
|-------|-------------|---------------|
102+
| **Logs Analysis** | Analyzes system logs, journald, dmesg, and application logs. Identifies errors, OOM events, kernel panics, and segfaults. | [`skills/logs-analysis/SKILL.md`](skills/logs-analysis/SKILL.md) |
103+
| **Resource Analysis** | Analyzes memory, CPU, disk usage, and processes. Identifies resource exhaustion and performance bottlenecks. | [`skills/resource-analysis/SKILL.md`](skills/resource-analysis/SKILL.md) |
104+
| **Network Analysis** | Analyzes network interfaces, routing, connections, firewall rules, and DNS configuration. | [`skills/network-analysis/SKILL.md`](skills/network-analysis/SKILL.md) |
105+
| **System Config Analysis** | Analyzes OS info, packages, systemd services, SELinux/AppArmor, and kernel parameters. | [`skills/system-config-analysis/SKILL.md`](skills/system-config-analysis/SKILL.md) |
106+
107+
Each skill document includes:
108+
- Step-by-step implementation instructions
109+
- Bash command examples with actual sosreport file paths
110+
- Error handling guidance
111+
- Output format templates
112+
- Common patterns and severity classifications
113+
- Tips for effective analysis
114+
115+
## Installation
116+
117+
### From Marketplace
118+
119+
```bash
120+
# Add the ai-helpers marketplace
121+
/plugin marketplace add openshift-eng/ai-helpers
122+
123+
# Install the sosreport plugin
124+
/plugin install sosreport@ai-helpers
125+
```
126+
127+
### Manual Installation
128+
129+
```bash
130+
# Clone the repository
131+
git clone https://github.com/openshift-eng/ai-helpers.git
132+
133+
# Add the ai-helpers marketplace from cloned directory
134+
/plugin marketplace add $(pwd)/ai-helpers
135+
136+
# Install the sosreport plugin
137+
/plugin install sosreport@ai-helpers
138+
```
139+
140+
## Typical Workflows
141+
142+
### Full Comprehensive Analysis
143+
144+
1. **Obtain sosreport**: Get a sosreport archive from a system (usually generated with `sosreport` or `sos report` command)
145+
146+
2. **Run comprehensive analysis**:
147+
```bash
148+
/sosreport:analyze /path/to/sosreport.tar.xz
149+
```
150+
151+
3. **Review findings**: Examine the interactive summary for critical issues and recommendations across all areas
152+
153+
4. **Deep dive**: Ask follow-up questions about specific findings:
154+
```bash
155+
Can you show me more details about the OOM killer events?
156+
What caused the httpd service to fail?
157+
```
158+
159+
5. **Take action**: Use the recommendations to troubleshoot and resolve issues
160+
161+
### Targeted Investigation Workflow
162+
163+
1. **Quick log scan** (fastest):
164+
```bash
165+
/sosreport:analyze /path/to/sosreport.tar.xz --only logs
166+
```
167+
Quickly identify error patterns and critical events
168+
169+
2. **Follow-up based on findings**:
170+
- If memory issues found: Run `--only resources`
171+
- If network errors found: Run `--only network`
172+
- If service failures found: Run `--only system-config`
173+
174+
3. **Example iterative investigation**:
175+
```bash
176+
# Start with logs to identify the problem area
177+
/sosreport:analyze /tmp/sos.tar.xz --only logs
178+
179+
# Found network timeouts, analyze network configuration
180+
/sosreport:analyze /tmp/sos.tar.xz --only network
181+
182+
# Network looks fine, check if it's a resource issue
183+
/sosreport:analyze /tmp/sos.tar.xz --only resources
184+
```
185+
186+
### Performance-Focused Workflow
187+
188+
When you know what you're looking for:
189+
190+
```bash
191+
# Only interested in service configuration
192+
/sosreport:analyze /path/to/sos.tar.xz --only system-config
193+
194+
# Need logs and network, skip the rest
195+
/sosreport:analyze /path/to/sos.tar.xz --only logs,network
196+
197+
# Full analysis but skip time-consuming resource analysis
198+
/sosreport:analyze /path/to/sos.tar.xz --skip resources
199+
```
200+
201+
## Use Cases
202+
203+
- **Incident response**: Quickly identify root causes of system failures
204+
- Start with `--only logs` for fastest initial assessment
205+
- Follow up with targeted analysis based on log findings
206+
207+
- **Performance troubleshooting**: Find resource bottlenecks and optimization opportunities
208+
- Use `--only resources` to focus on memory, CPU, and disk metrics
209+
- Combine with `--only logs` to correlate resource issues with errors
210+
211+
- **Configuration review**: Verify system configuration and identify misconfigurations
212+
- Use `--only system-config` to audit packages, services, and security settings
213+
- Add `--only network` to validate network configuration
214+
215+
- **Network troubleshooting**: Diagnose connectivity and firewall issues
216+
- Use `--only network,logs` to see network config and related errors
217+
- Skip resource-intensive analysis for faster results
218+
219+
- **Proactive monitoring**: Regular analysis of production system sosreports
220+
- Run comprehensive analysis for periodic health checks
221+
- Use selective analysis for quick daily checks
222+
223+
- **Knowledge transfer**: Let AI explain complex system issues to team members
224+
- Use selective analysis to focus on specific areas for learning
225+
- Each skill provides detailed documentation for understanding
226+
227+
## Prerequisites
228+
229+
- **tar**: For extracting compressed archives (usually pre-installed)
230+
- **Disk space**: At least 2x the size of the compressed sosreport
231+
232+
## Tips
233+
234+
- **Selective analysis**: Use `--only` or `--skip` to run specific analysis areas for faster results
235+
- `--only logs` is the fastest option for initial investigation
236+
- Combine multiple areas: `--only logs,network`
237+
- Valid areas: `logs`, `resources`, `network`, `system-config`
238+
239+
- **Archive handling**: The command works with both compressed archives (`.tar.gz`, `.tar.xz`) and extracted directories
240+
241+
- **Performance**: For large sosreports (>1GB)
242+
- Comprehensive analysis may take several minutes
243+
- Use selective analysis to reduce analysis time
244+
- Start with `--only logs` then add more areas as needed
245+
246+
- **Interactive investigation**: You can ask follow-up questions to drill deeper into specific findings
247+
- "Show me more details about the OOM killer events"
248+
- "What caused the httpd service to fail?"
249+
- "Analyze the network timeouts in more detail"
250+
251+
- **Workspace**: The extracted sosreport is preserved in `.work/sosreport-analyze/` for manual investigation
252+
253+
- **Skills documentation**: Each analysis area has detailed implementation guidance
254+
- See `skills/logs-analysis/SKILL.md` for log analysis details
255+
- See `skills/resource-analysis/SKILL.md` for resource analysis details
256+
- See `skills/network-analysis/SKILL.md` for network analysis details
257+
- See `skills/system-config-analysis/SKILL.md` for system config details
258+
259+
## Contributing
260+
261+
See the main [CLAUDE.md](../../CLAUDE.md) guide for information on contributing to this plugin.
262+
263+
## Resources
264+
265+
- [sosreport GitHub](https://github.com/sosreport/sos)
266+
- [Red Hat sosreport guide](https://access.redhat.com/solutions/3592)
267+
- [AI Helpers Repository](https://github.com/openshift-eng/ai-helpers)

0 commit comments

Comments
 (0)