Skip to content

ichoosetoaccept/kubernetes-docset

Repository files navigation

Kubernetes Docset Generator for Dash

A Python tool to generate high-quality Dash docsets for Kubernetes documentation with enhanced semantic indexing.

Features

  • Enhanced Indexing: Extracts ~8,000+ searchable entries including:

    • API Types (PodSpec, DeploymentStatus, Container, etc.)
    • Sections (Volumes, Scheduling, Security Context, etc.)
    • Guides (concepts, tasks, tutorials)
    • Glossary terms
    • API Resources
    • kubectl Commands
    • Components
  • Sitemap-Based Scraping: Uses kubernetes.io/en/sitemap.xml to download ~900 valid doc pages with zero 404 errors

  • Multiple Input Sources:

    • Scrape directly from kubernetes.io (--scrape)
    • Use existing Dash-generated docset (--source)
  • Validation: Built-in docset validation script and pre-commit hooks

  • Intelligent Parsing: Specialized parsers for different documentation types:

    • API reference pages
    • kubectl command documentation
    • Conceptual guides
    • Glossary entries
    • Code samples
    • Embedded Dash anchors

Requirements

  • Python >= 3.14 (or adjust in pyproject.toml)
  • uv package manager

Installation

git clone https://github.com/ichoosetoaccept/kubernetes-docset.git
cd kubernetes-docset

Usage

Option 1: Scrape from kubernetes.io (Recommended)

uv run python main.py --scrape

This will:

  1. Fetch sitemap and download ~900 documentation pages from kubernetes.io
  2. Download CSS, JS, and image assets
  3. Cache everything in .cache/kubernetes-docs/
  4. Generate a Dash docset in ./output/ with relative asset paths

Option 2: Use Existing Dash-Generated Source

uv run python main.py --source ~/Library/Application\ Support/Dash/Docset\ Generator/Kubernetes/Kubernetes.docset

This option uses a pre-generated docset from Dash's built-in scraper as the source, which may provide additional embedded anchors.

Custom Options

# Specify output directory
uv run python main.py --scrape --output ./my-docsets

# Specify Kubernetes version
uv run python main.py --scrape --version 1.34

# Use custom cache directory
uv run python main.py --scrape --cache-dir ./cache

Installation in Dash

After generating the docset:

  1. Open Dash
  2. Go to Preferences > Docsets
  3. Click the + button
  4. Select the generated .docset file

Or simply double-click the .docset file.

How It Works

Specialized Parsers

The tool uses multiple specialized parsers that analyze the HTML structure:

  1. DashAnchorParser: Extracts embedded dashAnchor entries from Dash-generated HTML
  2. EnhancedAPIReferenceParser: Extracts Types, Sections, and Properties from API reference HTML
  3. EnhancedKubectlParser: Extracts kubectl commands and subcommands
  4. CodeSampleParser: Identifies Kubernetes manifests in code blocks
  5. APIResourceParser: Identifies API resources (Pod, Deployment, Service, etc.)
  6. KubectlCommandParser: Extracts kubectl command documentation
  7. GuideParser: Indexes conceptual guides and tutorials
  8. GlossaryParser: Extracts glossary terms
  9. ComponentParser: Identifies Kubernetes components (kube-apiserver, kubelet, etc.)
  10. FallbackParser: Catches any unmatched documentation pages

Parser Priority

All matching parsers run on each file, allowing multiple entry types per page. For example, an API reference page might contribute:

  • A Resource entry for the API object
  • Multiple Type entries for embedded types (PodSpec, Container, etc.)
  • Multiple Section entries for field groups

Validation

# Validate the generated docset
uv run python verify.py -v

# Run pre-commit validation hook
prek run --hook-stage manual validate-docset

Project Structure

.
├── main.py                      # CLI entry point
├── verify.py                    # Docset validation script
├── k8s_docset/
│   ├── __init__.py
│   ├── builder.py               # Docset builder (path fixing, TOC injection)
│   ├── parsers.py               # Core HTML parsers
│   ├── enhanced_parsers.py      # Enhanced parsers for raw HTML
│   └── scraper.py               # Sitemap-based web scraper
├── contrib/                     # Dash contribution files
│   ├── docset.json
│   ├── icon.png
│   └── [email protected]
├── .pre-commit-config.yaml      # Pre-commit hooks (ruff, validation)
└── pyproject.toml               # Project dependencies

License

MIT License - see LICENSE for details.

About

Dash docset generator for Kubernetes documentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages