Kubernetes Docset Generator for Dash

A Python tool to generate high-quality Dash docsets for Kubernetes documentation with enhanced semantic indexing.

Features

Enhanced Indexing: Extracts ~8,000+ searchable entries including:
- API Types (PodSpec, DeploymentStatus, Container, etc.)
- Sections (Volumes, Scheduling, Security Context, etc.)
- Guides (concepts, tasks, tutorials)
- Glossary terms
- API Resources
- kubectl Commands
- Components
Sitemap-Based Scraping: Uses kubernetes.io/en/sitemap.xml to download ~900 valid doc pages with zero 404 errors
Multiple Input Sources:
- Scrape directly from kubernetes.io (--scrape)
- Use existing Dash-generated docset (--source)
Validation: Built-in docset validation script and pre-commit hooks
Intelligent Parsing: Specialized parsers for different documentation types:
- API reference pages
- kubectl command documentation
- Conceptual guides
- Glossary entries
- Code samples
- Embedded Dash anchors

Requirements

Python >= 3.14 (or adjust in pyproject.toml)
uv package manager

Installation

git clone https://github.com/ichoosetoaccept/kubernetes-docset.git
cd kubernetes-docset

Usage

Option 1: Scrape from kubernetes.io (Recommended)

uv run python main.py --scrape

This will:

Fetch sitemap and download ~900 documentation pages from kubernetes.io
Download CSS, JS, and image assets
Cache everything in .cache/kubernetes-docs/
Generate a Dash docset in ./output/ with relative asset paths

Option 2: Use Existing Dash-Generated Source

uv run python main.py --source ~/Library/Application\ Support/Dash/Docset\ Generator/Kubernetes/Kubernetes.docset

This option uses a pre-generated docset from Dash's built-in scraper as the source, which may provide additional embedded anchors.

Custom Options

# Specify output directory
uv run python main.py --scrape --output ./my-docsets

# Specify Kubernetes version
uv run python main.py --scrape --version 1.34

# Use custom cache directory
uv run python main.py --scrape --cache-dir ./cache

Installation in Dash

After generating the docset:

Open Dash
Go to Preferences > Docsets
Click the + button
Select the generated .docset file

Or simply double-click the .docset file.

How It Works

Specialized Parsers

The tool uses multiple specialized parsers that analyze the HTML structure:

DashAnchorParser: Extracts embedded dashAnchor entries from Dash-generated HTML
EnhancedAPIReferenceParser: Extracts Types, Sections, and Properties from API reference HTML
EnhancedKubectlParser: Extracts kubectl commands and subcommands
CodeSampleParser: Identifies Kubernetes manifests in code blocks
APIResourceParser: Identifies API resources (Pod, Deployment, Service, etc.)
KubectlCommandParser: Extracts kubectl command documentation
GuideParser: Indexes conceptual guides and tutorials
GlossaryParser: Extracts glossary terms
ComponentParser: Identifies Kubernetes components (kube-apiserver, kubelet, etc.)
FallbackParser: Catches any unmatched documentation pages

Parser Priority

All matching parsers run on each file, allowing multiple entry types per page. For example, an API reference page might contribute:

A Resource entry for the API object
Multiple Type entries for embedded types (PodSpec, Container, etc.)
Multiple Section entries for field groups

Validation

# Validate the generated docset
uv run python verify.py -v

# Run pre-commit validation hook
prek run --hook-stage manual validate-docset

Project Structure

.
├── main.py                      # CLI entry point
├── verify.py                    # Docset validation script
├── k8s_docset/
│   ├── __init__.py
│   ├── builder.py               # Docset builder (path fixing, TOC injection)
│   ├── parsers.py               # Core HTML parsers
│   ├── enhanced_parsers.py      # Enhanced parsers for raw HTML
│   └── scraper.py               # Sitemap-based web scraper
├── contrib/                     # Dash contribution files
│   ├── docset.json
│   ├── icon.png
│   └── [email protected]
├── .pre-commit-config.yaml      # Pre-commit hooks (ruff, validation)
└── pyproject.toml               # Project dependencies

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
contrib		contrib
k8s_docset		k8s_docset
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
SCRAPER_IMPROVEMENTS.md		SCRAPER_IMPROVEMENTS.md
contribute.py		contribute.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock
verify.py		verify.py
verify_contribution.py		verify_contribution.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kubernetes Docset Generator for Dash

Features

Requirements

Installation

Usage

Option 1: Scrape from kubernetes.io (Recommended)

Option 2: Use Existing Dash-Generated Source

Custom Options

Installation in Dash

How It Works

Specialized Parsers

Parser Priority

Validation

Project Structure

License

About

Uh oh!

Releases

Packages

Languages

License

ichoosetoaccept/kubernetes-docset

Folders and files

Latest commit

History

Repository files navigation

Kubernetes Docset Generator for Dash

Features

Requirements

Installation

Usage

Option 1: Scrape from kubernetes.io (Recommended)

Option 2: Use Existing Dash-Generated Source

Custom Options

Installation in Dash

How It Works

Specialized Parsers

Parser Priority

Validation

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages