Docker image is huge #70
Replies: 12 comments 8 replies
-
@Oliver-Hanikel Yes I know. The image size can be reduced drastically, but I haven't looked into that yet... For now I used pandoc because all my notes had math and latex embedded into them. Pandoc was the one I used at the time that fully supported al my needs. If there are other option that support latex etc. I'll be happy to hear that! |
Beta Was this translation helpful? Give feedback.
-
Is Latex support even needed in the markdown converter? Isn't mathjax used in the frontend for the conversion? If Latex is not needed we could switch to pymd4c. md4c also is according to themselves the fastest markdown converter there is. |
Beta Was this translation helpful? Give feedback.
-
After switching to
But this does not work for armhf, because the |
Beta Was this translation helpful? Give feedback.
-
I wrote the math of all my notes with latex syntax and I use references (https://github.com/Linbreux/wikmd/blob/main/wiki/How%20to%20use%20the%20wiki.md) also image sizing is easy. When we have developed a macro system I wouldn't mind changing from pandoc to anther one. But for know I personally use to many functionality from pandoc. But I'll take a look at pymd4c, thanks for the suggestion! |
Beta Was this translation helpful? Give feedback.
-
That's an whole improvement!
Hmm could we create one ourself from source? |
Beta Was this translation helpful? Give feedback.
-
I managed to remove BeatifulSoup, Markdown and Pandoc from the dependencies and added PyMD4C as a replacement. They didn't promise too much, it really is blazingly fast. The Example documents took 400-800ms to render with pandoc on my laptop. MD4C manages to render them in 2-8ms.
Most of these are probably fixable with the DOM Parser and a bit of work. Currently I am using the, basically completly in C implemented, HTMLRenderer so switching to the DOM Parser will probably make the rendering a bit slower.
|
Beta Was this translation helpful? Give feedback.
-
@Oliver-Hanikel Interesting! Like I said, this would be an interesting implementation. @kura Implemented a cache system which should speed up loading times drastically. When it's possible to use all the features in Pandoc is not the best option, but it supports ton's of features https://pandoc.org/MANUAL.html#pandocs-markdown |
Beta Was this translation helpful? Give feedback.
-
I personally don't see a problem with an image that is 400MB+ in size given my the documents and uploads in my wiki are already 200MB+ in size. As for replacing pandoc, I think Markdown would be a good alternative, since it has support for the ToC feature in development and it's already in use in the Whoosh search feature. Any removal of BeautifulSoup would mean needing a tool that is capable of converting Markdown to plaintext directly to replace the Markdown -> HTML -> Plaintext step done in the search module to make the content indexable in a way that is searchable. Markdown does have an extension that could be used to handle the LaTeX which may mean everything in |
Beta Was this translation helpful? Give feedback.
-
So, I just checked and even the smallest LaTeX library that can be used by the Markdown-LaTeX extension is 160MB alone so it's not that much of an image size reduction. |
Beta Was this translation helpful? Give feedback.
-
Yeah my branch definitely isn't ready for usage, there are too many features missing. It is more of an experiment.
Well it is much faster to download new images, also the image generally builds faster now. I am running wikmd on a Raspberry Pi 3B with pretty small markdown files so I prefer a leaner docker image. A smaller image wears out the sd card only as much as needed, so it has a longer lifetime.
You can either do this with pyMD4C as shown here or with the HTMLParser from the standard library, which is also the parser BeatifulSoup uses in the current version of wikmd. Here is a working version of that. I am now looking into using TinyTex in the docker image to make it smaller while still using pandoc. |
Beta Was this translation helpful? Give feedback.
-
Just FYI I made a very small set of changes that replaces 90% of the pandoc functionality using the python-markdown library. Only thing that isn't properly working is the latex functionality. I tried using a ~400MB install of texlive to handle the latex stuff but it isn't properly detecting a handling things like As a note, it also hooks in to some of the markdown extensions to add in things like Table of Content support using the built-in Maybe something like TinyTex as mentioned would be a better solution and would maybe fix some of the LaTeX issues? I may give it a try later. I had not thought about using the built-in HTMLParser for search... I'll give that a whirl now. |
Beta Was this translation helpful? Give feedback.
-
The TOC should be flexible, I guess. It would be nice to have it visible in the sidebar of the wiki in the future (with interactive scrolling) but thats not yet on the agenda |
Beta Was this translation helpful? Give feedback.
-
The image is way too big for a wiki that tries to be lean.
I'll try to improve this by using alpine as the base image.
But there would also be other ways like reducing the amount of dependencies or switching out dependencies. For example the installed size of
pandoc
is 100MB on arm64 butmarkdown
would only take up 57KB.Beta Was this translation helpful? Give feedback.
All reactions