Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration of PDF documentation to ReadTheDocs #1

Open
mmatera opened this issue Feb 6, 2025 · 5 comments
Open

Migration of PDF documentation to ReadTheDocs #1

mmatera opened this issue Feb 6, 2025 · 5 comments

Comments

@mmatera
Copy link
Contributor

mmatera commented Feb 6, 2025

At this point, I was able to build both HTML and PDF documentation using Sphinx, except for using the output of the DocTests. Then, there are two possibilities: to use asymptote to produce pdf output, or using svg to produce HTML images. Maybe we need to produce both.

@mmatera
Copy link
Contributor Author

mmatera commented Feb 6, 2025

Here is the resulting PDF file:

mathics.pdf

@rocky
Copy link
Member

rocky commented Feb 6, 2025

Pretty cool!

@mmatera
Copy link
Contributor Author

mmatera commented Feb 9, 2025

How to continue

At this point, I have an almost fully working Sphinx documentation:

  • Loads the documentation from docstrings and mdoc files from mathics.doc.documentation.
  • Handles all the currently used special tags from the Mathics documentation system,
    and converts to proper RsT syntax.
  • Generates and load the docpipeline LaTeX pcl file, and convert it to RsT pcl file,
    to populate the DocTests. This can be make optional: we can use the text-line (outputform)
    output from the docstrings, but we loose graphics and picture outputs.

Issues:

  • When using docpipeline (graphic) output, we loose the DocTest structure
        >>> input
          = output
    

because it does not accept LaTeX / graphics as output.

  • Fancy tables including \multiline or includegraphics cannot be rendered directly.
    For just one \includegraphics, it can be translated to .. image:: RsT syntax, but
    several combined pictures cannot be handled in that way. What I did is to use
    matplotlib to convert the complex LaTeX code into a picture, and then into a single
    \includegraphics instruction. It is not optimal, but works decently for the few cases
    we have in the documentation.

Questions:

  • Shall we priviledge the DocTest syntax? Is there a better way to show the outputs?
  • Would be better using MathML output instead LaTeX? Maybe we should have a RsT formatter
    for boxes?
  • Supposing all of this works, shall we consider allow RsT syntax in docstrings?
  • Also in that case, makes any sense to keep latex_doc.py module in the next version of Mathics?

@rocky
Copy link
Member

rocky commented Feb 9, 2025

I'd like to back up and start from the high-level goal and then work towards the details and a plan.

What is happening here is a bad pattern that we've seen many times: work bottom-up, hack some code, or extend existing hacked code and hope that we'll find a high-level design and pattern somehow in this process.

We want to use Sphinx in the way Sphinx is normally used because this is standard practice to provide program documentation. As we do with SymPy or Python, these tools improve on their own and we benefit from that work.

Sphinx extends ReStructured text to add more book-like features that LaTeX has: table of contents, and book indexing. While this is good, we already have this via LaTeX. And Sphinx provides on-line HTML browsing which is sort of nice but we have something in Django, and how this will interact with the front-end notebook interfaces is yet to be determined.

Finally, since Sphinx is written in Python and has a long history of working with Python, it has been integrated well into the Python ecosystem.

What we don't want to do is replace one set of hacks with another, more elaborate, set of hacks.

It is fine however to use some sort of hacky code as a means to translate from what we have to how we want to do things in the future though.

So looking at Sphinx's existing API and way of doing things, I direct your attention to https://www.sphinx-doc.org/en/master/tutorial/describing-code.html and https://www.sphinx-doc.org/en/master/tutorial/describing-code.html#other-languages-c-c-others . We should follow those patterns for defining a Mathics3 domain.

So a first step would be to write some sample code in a Mathics3 markup domain. Then discuss this and write the code to handle this domain. And then write a replacement sphinx.ext.doctest extension to do what is currently done for Python testing but for Mathics3 instead.

Then after that is done we can start converting the existing docstrings into this new format.

As for overall work planning, splitting the documentation building from Mathics Kernel was a step long asked for.

One reason this is needed is because In the Mathics3 Kernel right now, we don't list dependencies on:

  • documentation (LaTeX, PDF, or in the future Sphinx)
  • asymptote or inkscape graphics, or
  • the Mathics3 modules that we use to build the docs

and if we were honest, we would.

So I am glad to see a start here in a new GitHub repository. For now, I'd be happy if we just were able to build the documentation as is and then we can remove it from Mathics3 Kernel. Including additional experimental stuff for producing RsT I guess is fine to have here as well. I looked at the RST produced and it is a bit lacking were one to write in this form initially. I also note that while it is cool we can build some sort of Sphinx doc, the resulting doc is not as good as what we currently have via LaTeX.

So now let me close with the matter of timing. At this point, we said we'd working on Boxing. While replacing the documentation system was always a long-term goal, it has always been a lower-priority one. It is not as important as Boxing right now in my opinion. The work involved in extending the existing conversion code hack to produce a better or a more complete Sphinx doc is in my opinion not a good use of resources right now.

@mmatera
Copy link
Contributor Author

mmatera commented Feb 9, 2025

@rocky, let me answer your comments in a slightly different order. First the timing and motivation:

So now let me close with the matter of timing. At this point, we said we'd working on Boxing. While replacing the documentation system was always a long-term goal, it has always been a lower-priority one. It is not as important as Boxing right now in my opinion. The work involved in extending the existing conversion code hack to produce a better or a more complete Sphinx doc is in my opinion not a good use of resources right now.

To work in Sphinx Documentation does not interfere too much with the release, and also helps to detect typos and inconsistencies in current documentation. So, if it is low-priority, I was right in timing: now we have documentation in a better shape than it was in the 8.0.0 release. On the other hand, it just took few days to start having a working prototype.

I'd like to back up and start from the high-level goal and then work towards the details and a plan.

I think that drawing a very detailed plan at this stage would be hard, but what I have in mind is:

  1. See if it is possible to use the last release of Mathics3, a Sphinx RsT documentation tree, which can produce a PDF documentation of a quality close to the one in Mathics.pdf produced with our old documentation system, in a separate repo.
  2. If this is possible, (and I am close to get it working), consider replacing the old PDF documentation by the Sphinx generate documentation system. This would not break the last release, and the current testing mechanism in mathics-core.
  3. Assuming both previous steps are ready, modify the core to allow standard RsT docstrings. At that stage, we can avoid the translation step in the Sphinx documentation, but we should do the translation in the docpipeline tests, and in the Mathics-Django documentation.
  4. Eventually, if we see that it is possible to migrate to standard RsT docstrings everywhere, then we could face a full migration to the standard RsT docstrings. Finally we could get rid of all the existent hacky code.

What is happening here is a bad pattern that we've seen many times: work bottom-up, hack some code, or extend existing hacked code and hope that we'll find a high-level design and pattern somehow in this process.

The previous plan goes exactly in the opposite direction, by trying to disentangle the problem in several steps achievable along a finite (but not necesarily short) extension of time.

We want to use Sphinx in the way Sphinx is normally used because this is standard practice to provide program documentation. As we do with SymPy or Python, these tools improve on their own and we benefit from that work.

Sphinx extends ReStructured text to add more book-like features that LaTeX has: table of contents, and book indexing. While this is good, we already have this via LaTeX. And Sphinx provides on-line HTML browsing which is sort of nice but we have something in Django, and how this will interact with the front-end notebook interfaces is yet to be determined.

Finally, since Sphinx is written in Python and has a long history of working with Python, it has been integrated well into the Python ecosystem.

What we don't want to do is replace one set of hacks with another, more elaborate, set of hacks.

Again, the plan is to remove step-by-step the entanglement between all these blocks of hacky code.

It is fine however to use some sort of hacky code as a means to translate from what we have to how we want to do things in the future though.

So looking at Sphinx's existing API and way of doing things, I direct your attention to https://www.sphinx-doc.org/en/master/tutorial/describing-code.html and https://www.sphinx-doc.org/en/master/tutorial/describing-code.html#other-languages-c-c-others . We should follow those patterns for defining a Mathics3 domain.

So a first step would be to write some sample code in a Mathics3 markup domain. Then discuss this and write the code to handle this domain. And then write a replacement sphinx.ext.doctest extension to do what is currently done for Python testing but for Mathics3 instead.

Actually I looked at it, but I saw that to make that graphics doctests works with it would require to me to understand better how Sphinx parses RsT. I tryied to write the code that translates our docstrings to RsT in a way that result clear which functions must be modified to produce a more standard RsT, which uses Sphinx extensions to handle doctests.

Then after that is done we can start converting the existing docstrings into this new format.

This path would be more direct if we had some experts in writing these extensions, who are committed to migrating all the documentation system (including the one in Mathics-Django). My guess is that we are not in that position, and this is why I propose the plan I mentioned at the begining of the post.

As for overall work planning, splitting the documentation building from Mathics Kernel was a step long asked for.

One reason this is needed is because In the Mathics3 Kernel right now, we don't list dependencies on:

* documentation (LaTeX, PDF, inkscape, or in the future Sphinx)

* graphics, or

* the Mathics3 modules that we use to build the docs

With this change, Sphinx handles almost all those dependencies. If we are able to get rid latex_doc.py and doc2latex , then these dependencies would also removed from mathics-core too.

and if we were honest, we would.

So I am glad to see a start here in a new GitHub repository. For now, I'd be happy if we just were able to build the documentation as is and then we can remove it from Mathics3 Kernel. Including additional experimental stuff for producing RsT I guess is fine to have here as well. I looked at the RST produced and it is a bit lacking were one to write in this form initially. I also note that while it is cool we can build some sort of Sphinx doc, the resulting doc is not as good as what we currently have via LaTeX.

Indeed, the RsT code that I am generating now is far from being optimal. However, once it is generated, we can modify it before rendering the documentation, to see how to improve the translator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants