Skip to content

Migrate documentation away from GitBook #17281

@gortiz

Description

@gortiz

Currently, Apache Pinot documentation is written in GitBook, a proprietary solution designed for its own web application. The documentation is stored in a Git repository (link) as markdown, but it is problematic to contribute new documentation in GitHub PRs.

This document proposes an alternative way to write documentation where:

  1. The documentation is written in markdown files. These files can be easily written by anyone, even if they are not developers. This will also make the migration easier.
  2. The documentation is stored in the main Pinot repository (https://github.com/apache/pinot). By being in the same repository as the code, reviewers can verify that code changes are correctly documented, making it easier to keep code and documentation in sync.
  3. The documentation tool is free from vendor lock-in. We don’t want to be dependent on what a single vendor decides, as happened with GitBook.
  4. The documentation can be versioned. It should be possible to view the documentation for different Pinot versions.
  5. The documentation can be tested. Specifically, the CICD should verify that there are no broken links, images, etc.
  6. The documentation tool is accepted by Apache Foundation.

The main disadvantage of having the code in the same repository is that any documentation modification would require a +1 from a Pinot committer. That can also be seen as a feature, as the documentation is an essential part of the product and should be treated in the same way (and with the same protections) as the code.

Proposed solution: MkDocs

MkDocs is a tool similar to GitBook before it shifted its focus to its own application. It converts MD files into HTML pages. It was created more than 10 years ago, and a large community uses it. MkDocs includes a plugin system for adding new features. For example, Material for MkDdocs provides extra features and simple configuration.

MkDocs can be self-hosted or hosted by third parties, such as Read the Docs or GitHub Pages. It is very easy to create a GitHub action that runs MkDocs, and it can be configured to validate the documentation (see validation in the MkDocs documentation). This is heavily focused on verifying that internal links are not broken. Therefore, we could create a GH action that runs on any PR that touches the documentation folder and fails if any internal links are broken.

Last year, I opened #14346, proposing to use MkDocs to write dev documentation. That includes a GitHub Action that verifies, on each PR, that all pages are correctly linked.

MkDocs in Apache Foundation project

MkDocs is used by several Apache Foundation projects. There is even a short page (link) from the Foundation that explains how to keep the .asf.yaml file when using MkDocs. As Apache projects using MkDocs, here we have some examples:

Migration path

How to migrate pages

Given that GitBook still lets you export the documentation as MD files, it is not necessary to rewrite the documentation to migrate to MkDocs. What is needed is to apply some changes to the extensions provided by GitBook. For example, the syntax for info panels used by MkDocs (specifically Material for MkDocs) is different. See Material for MkDocs documentation for reference.

IBM created a page indicating how to migrate from GitHub to MkDocs. Some of the steps are specific to the disk layout they use for their docs, but in general, it looks very simple to do.

I plan to create a PR in https://github.com/pinot-contrib/pinot-docs, where we can use MkDocs to locally serve documentation built from the current markdown pages, so we can see the difference.

How to host pages

I don't know who is hosting the Apache Pinot documentation webpage right now. It is probably GitBook, so we would need to decide where to host the doc page once we migrate. The easiest solution would be to use GitHub pages, which is what Apache Sedora and Apache Iceberg do. This is a very simple process (see how it is done in Sedona or how it is done in Iceberg)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions