-
Notifications
You must be signed in to change notification settings - Fork 21
Finish LLM text exporter #1417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finish LLM text exporter #1417
Conversation
Niceeee! How does this work? Do we generate the final Markdown from an intermediate representation/AST? It's important that the final file we produce has the same content as the rendered HTML, that is, resolved substitutions, etc. |
We do not sadly, that was the initial plan but would be too time costly too implement due to a quirk in our parser's handling of TrackTrivia and loose list continuations.
In the end we do get the same content, we might need to go over this again when we implement more dynamic |
I guess converting from the final HTML to Markdown would be too primitive / slow? I used that approach in the past for a NextJS project for generating the llmstxt file and it wasn't too bad. |
Yeah potentially, it's also more labor intensive projecting everything back with proper indentations etcetera. I would worry too much about lists of list etcetera. Not closing the door doing that but what we have now is good enough. |
Thought this was a prerequisite for LLM text by utilizing markdigs round trip serializer as a base of our own. However that heavily relies on parsing with Trivia something we can not do because it breaks list continuations. See #435
We now manually parse includes and re-evaluate substitutions on the full included files. Adding the exporter does not add much overhead in addition to the HTML exporter
This emits a
filename.md
next to eachfilename/index.html
In addition it emits a
llm.zip
that can be used to download everything at once.