Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,11 @@
"source": "./plugins/must-gather",
"description": "A plugin to analyze and report on must-gather data"
},
{
"name": "lvms",
"source": "./plugins/lvms",
"description": "LVMS (Logical Volume Manager Storage) plugin for troubleshooting and debugging storage issues"
},
{
"name": "hcp",
"source": "./plugins/hcp",
Expand Down
10 changes: 10 additions & 0 deletions PLUGINS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This document lists all available Claude Code plugins and their commands in the
- [Hcp](#hcp-plugin)
- [Hello World](#hello-world-plugin)
- [Jira](#jira-plugin)
- [Lvms](#lvms-plugin)
- [Must Gather](#must-gather-plugin)
- [Olm](#olm-plugin)
- [Openshift](#openshift-plugin)
Expand Down Expand Up @@ -105,6 +106,15 @@ A plugin to automate tasks with Jira

See [plugins/jira/README.md](plugins/jira/README.md) for detailed documentation.

### Lvms Plugin

LVMS (Logical Volume Manager Storage) plugin for troubleshooting and debugging storage issues

**Commands:**
- **`/lvms:analyze` `[must-gather-path|--live] [--component storage|operator|volumes]`** - Comprehensive LVMS troubleshooting - analyzes LVMCluster, volume groups, PVCs, and storage issues on live clusters or must-gather

See [plugins/lvms/README.md](plugins/lvms/README.md) for detailed documentation.

### Must Gather Plugin

A plugin to analyze and report on must-gather data
Expand Down
21 changes: 21 additions & 0 deletions docs/data.json
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,27 @@
],
"has_readme": true
},
{
"name": "lvms",
"description": "LVMS (Logical Volume Manager Storage) plugin for troubleshooting and debugging storage issues",
"version": "0.1.0",
"commands": [
{
"name": "analyze",
"description": "Comprehensive LVMS troubleshooting - analyzes LVMCluster, volume groups, PVCs, and storage issues on live clusters or must-gather",
"synopsis": "/lvms:analyze [must-gather-path] [--live] [--component <component>]",
"argument_hint": "[must-gather-path|--live] [--component storage|operator|volumes]"
}
],
"skills": [
{
"name": "LVMS Analyzer",
"id": "lvms-analyzer",
"description": "Analyzes LVMS must-gather data to diagnose storage issues"
}
],
"has_readme": true
},
{
"name": "hcp",
"description": "Generate HyperShift cluster creation commands via hcp CLI from natural language descriptions",
Expand Down
8 changes: 8 additions & 0 deletions plugins/lvms/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"name": "lvms",
"description": "LVMS (Logical Volume Manager Storage) plugin for troubleshooting and debugging storage issues",
"version": "0.1.0",
"author": {
"name": "github.com/openshift-eng"
}
}
251 changes: 251 additions & 0 deletions plugins/lvms/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
# LVMS Plugin

Comprehensive troubleshooting and debugging plugin for LVMS (Logical Volume Manager Storage).

## Overview

The LVMS plugin provides powerful commands for diagnosing and troubleshooting storage issues in OpenShift clusters using LVMS. It analyzes LVMCluster resources, volume groups, PVCs, TopoLVM CSI driver, and node-level storage configuration to identify root causes of storage failures.

## Commands

### `/lvms:analyze`

Comprehensive LVMS troubleshooting that analyzes cluster health, storage resources, and identifies common issues.

**Works with:**
- Live OpenShift clusters (via `oc` CLI)
- LVMS must-gather data (offline analysis)

**Features:**
- LVMCluster health and readiness analysis
- Volume group status across all nodes
- PVC/PV binding issues and pending volumes
- LVMS operator and TopoLVM CSI driver health
- Node-level device availability and configuration (live clusters)
- Thin pool capacity and usage
- Pod log analysis with error deduplication
- Root cause analysis with specific remediation steps

**Usage Examples:**

```bash
# Analyze live cluster
/lvms:analyze --live

# Analyze must-gather data
/lvms:analyze ./must-gather/registry-ci-openshift-org-origin-4-18.../

# Focus on specific component
/lvms:analyze --live --component storage
/lvms:analyze ./must-gather/... check pending PVCs

# Analyze pod logs only
/lvms:analyze --live --component logs
/lvms:analyze ./must-gather/... --component logs
```

## Common Use Cases

### 1. PVCs Stuck in Pending State

When PVCs using LVMS storage classes are not binding:

```bash
/lvms:analyze --live check pending PVCs
```

The command will:
- Identify which PVCs are pending
- Check volume group free space
- Verify TopoLVM CSI driver is running
- Check for node affinity issues
- Provide specific remediation steps

### 2. LVMCluster Not Ready

When LVMCluster resource is not reaching Ready state:

```bash
/lvms:analyze --live analyze operator
```

The command will:
- Check LVMCluster status and conditions
- Identify which nodes have volume group issues
- Verify device availability and configuration
- Check for conflicting filesystems on devices
- Provide steps to clean devices and recreate VGs

### 3. Volume Group Creation Failures

When volume groups are not being created on nodes:

```bash
/lvms:analyze --live --component volumes
```

The command will:
- Show volume group status per node
- Identify missing or failed volume groups
- Check device selector configuration
- Detect devices already in use
- Provide commands to wipe devices and retry

### 4. Must-Gather Analysis

When analyzing a must-gather from a failed cluster:

```bash
/lvms:analyze ./must-gather/path/
```

The command will:
- Perform offline analysis of all LVMS resources
- Generate comprehensive health report
- Identify critical issues and warnings
- Provide prioritized remediation recommendations
- Suggest which logs to review

## Installation

### From Marketplace

```bash
# Add the marketplace
/plugin marketplace add openshift-eng/ai-helpers

# Install LVMS plugin
/plugin install lvms@ai-helpers

# Use the command
/lvms:analyze --live
```

### Manual Installation

```bash
# Clone the repository
git clone https://github.com/openshift-eng/ai-helpers.git

# Link to Claude Code plugins directory
ln -s $(pwd)/ai-helpers/plugins/lvms ~/.claude/plugins/lvms
```

## Prerequisites

**For Live Cluster Analysis:**
- `oc` CLI installed and configured
- Active cluster connection
- Read access to `openshift-lvm-storage` or older `openshift-storage` namespace
- Ability to read cluster-scoped resources

**For Must-Gather Analysis:**
- Python 3.6+ (for analysis script)
- PyYAML library: `pip install pyyaml`

## What the Plugin Checks

### LVMCluster Resources
- Overall state (Ready, Progressing, Failed, Degraded)
- Status conditions (ResourcesAvailable, VolumeGroupsReady)
- Device class configurations
- Node coverage and readiness

### Volume Groups
- Volume group creation status per node
- Physical volume availability
- Free space and capacity
- Thin pool configuration and usage
- Missing or failed volume groups

### Storage (PVCs/PVs)
- PVC binding status
- Pending volume provisioning failures
- Storage class configuration
- Capacity issues
- Node affinity constraints

### Operator Health
- LVMS operator deployment status
- TopoLVM controller readiness
- TopoLVM node daemonset coverage
- VG-manager daemonset status
- Pod crashes and restarts

### Node Devices
- Block device availability
- Existing filesystems on devices
- Device selector matches
- Disk capacity and usage

### Pod Logs
- Error and warning messages from vg-manager pods
- Error and warning messages from lvms-operator pod
- Deduplication of repeated errors from reconciliation loops
- JSON log parsing with timestamps and context

## Output Format

The plugin provides structured, color-coded output:

- ✓ Green checkmarks for healthy components
- ⚠ Yellow warnings for non-critical issues
- ❌ Red errors for critical problems
- ℹ Blue info for additional context

Reports include:
- Component-by-component health status
- Root cause analysis
- Prioritized recommendations
- Specific remediation commands
- Links to relevant documentation

## Troubleshooting the Plugin

**Script not found:**
```bash
# Verify script exists
ls plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py

# Make executable
chmod +x plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
```

**Cannot connect to cluster:**
```bash
# Verify oc is configured
oc whoami
oc cluster-info

# Check LVMS namespace
oc get namespace openshift-lvm-storage
```

**Must-gather path errors:**
```bash
# Use the correct subdirectory (the one with the hash)
ls must-gather/registry-ci-*/namespaces/openshift-lvm-storage

# Not the parent directory
```

## Related Resources

- [LVMS GitHub Repository](https://github.com/openshift/lvm-operator)
- [LVMS Troubleshooting Guide](https://github.com/openshift/lvm-operator/blob/main/docs/troubleshooting.md)
- [TopoLVM Documentation](https://github.com/topolvm/topolvm)
- [OpenShift Storage Documentation](https://docs.openshift.com/container-platform/latest/storage/index.html)

## Contributing

Contributions are welcome! Please see the main repository's [CLAUDE.md](../../CLAUDE.md) for guidelines on:
- Adding new commands
- Extending analysis capabilities
- Improving diagnostic checks
- Adding helper scripts

## Support

For issues or feature requests:
- GitHub Issues: https://github.com/openshift-eng/ai-helpers/issues
- Repository: https://github.com/openshift-eng/ai-helpers
Loading