Skip to content

Conversation

@jacobtomlinson
Copy link
Contributor

@jacobtomlinson jacobtomlinson commented Dec 11, 2025

#10978 added llms.txt support, however, as this repo makes heavy use of extensions and directives in the docs including autodoc, autosummary, nbsphinx and more I just wanted to point out that the current extension you're using sphinx-llms-txt doesn't support these in the llms.txt output. It simply copies the docs source through in the original format.

I've also been working on an extension that generates llms.txt files called sphinx-llm, but it does so by running a parallel sphinx build in the background using the markdown builder. It builds markdown versions of every page, this includes autogenerated pages, and all use of extensions and directives. You can append .md to any docs URL to see the markdown version of the page. Not every extension is supported by the markdown builder, but it does a pretty good job. It then concatenates these into the llms-full.txt and generates the llms.txt sitemap too. As sphinx is single-threaded this has no build time impact on systems with at least two cores.

The author of sphinx-llms-txt and I have discussed this and I proposed merging the two projects. However they have said the way sphinx-llms-txt passes the source through is by design and they don't have any interest in collaborating.

As xarray uses extensions and directives I wanted to propose switching to my extension. But no problem either way.

Disclaimer: In case it's not clear I am the author of sphinx-llm so this is a self-serving PR, but I think it will benefit xarray.

cc @dcherian @VeckoTheGecko

@jacobtomlinson
Copy link
Contributor Author

I suspect the first build failed because the two parallel builds collided in some way. I've run it a couple of times locally and it builds sometimes and fails others, this is certainly some kind of race condition. Perhaps notebooks being executed clobbered each other or something. I can think of a couple of ways forward:

  • Add a setting to sphinx-llm to allow you to configure parallel/sequential builds
  • Tweak the notebooks to use unique filenames to make docs builds threadsafe

@jacobtomlinson
Copy link
Contributor Author

jacobtomlinson commented Dec 11, 2025

Yeah it looks like some jupyter-execute cells are not threadsafe. For example in the quickstart an example.nc file is created in the doc/ directory and then removed. When run concurrently there may be write collisions, read after remove errors or duplicated remove errors.

.. jupyter-execute::
ds.to_netcdf("example.nc")
reopened = xr.open_dataset("example.nc")
reopened
.. jupyter-execute::
:hide-code:
import os
reopened.close()
os.remove("example.nc")

To fix this it would be best if these cells used unique tempdirs, but that might also overcomplicate the examples. Caring about thread safety in your docs builds might just increase maintainability in a way you don't want.

I've just released sphinx-llm==0.2.1 which has a config option to enable sequential builds which would avoid these race conditions. I've updated this PR to use that and it looks like RTD build time increased from 8 minutes to 10 minutes. If you would prefer to make these things threadsafe and switch back to a parallel build let me know, I'd be happy to update this PR to do that instead.

@dcherian
Copy link
Contributor

Thanks Jacob, seems like an improvement to me!

Making those cells thread-safe would be a good improvement too IMO.

@jacobtomlinson
Copy link
Contributor Author

Ok great. It might be best to merge this as is, then I can make a follow up PR to make those cells thread safe and change the config back to a parallel build.

@dcherian dcherian merged commit 172ba1e into pydata:main Dec 11, 2025
40 checks passed
@jacobtomlinson jacobtomlinson deleted the doc/switch-to-sphinx-llm branch December 11, 2025 17:09
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 11, 2025
* main:
  Switch from `sphinx-llms-txt` => `sphinx-llm` (pydata#11003)
  Backend fastpath (pydata#10937)
  Change behavior of `keep_attrs` in `xr.where` when x is a scalar (pydata#10997)
  make error message more clear (pydata#10994)
  Address pandas-related upstream errors in pydata#10973 (pydata#10988)
  new release section (pydata#10985)
  set the release date (pydata#10984)
  release v2025.12.0 (pydata#10981)
  Add sphinx-llms-txt (pydata#10978)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants