You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that devdocs scraper is ready, we need to create recipes for every ZIM we wanna create with this scraper.
To do this, we need:
a script tool to create all these recipes on the Zimfarm (repurposing the one done for TED)
a definition of which devdocs ZIMs we wanna create
On first point, I've started to create a generic tool (for developers) capable to create and maintain a set of Zimfarm recipes (Stackoverflow, TED, devdocs, libretexts, ...).
Initial idea was to create one ZIM per slug, also because it is what the scraper is capable of. This would give us for instance python 3.10, python 3.11, python 3.12, ... The fact is that there is 716 slugs, and I'm not totally convinced anymore that creating so many ZIM is really the good solution.
Another approach would be to create one ZIM per "Name", e.g. one for Python, one for Lua, ... with all versions inside. At least as an end-user, I can image one might prefer to have one ZIM for Python with all versions inside, so that we do not have to switch to another ZIM everytime we switch Python version. This would give us only 221 ZIMs.
But the fact is that the scraper is not (yet) capable to create these "mega-ZIM" (mega does not mean it is going to be long to create or consume lot of space, just that there is multiple things inside, and "über-ZIM" looks too German 🤣 ), and I'm pretty sure that it will make searching (via suggestion or full-text search) even harder because we will often have duplicates across versions.
I do not consider creating a ZIM only for most recent version (e.g. Python 3.13 only for Python), because it does not look very handy (e.g. I might still be forced to use Python 3.10 for whatever reason and need the doc for that version).
My recommendation so far would be to stick to the original idea to create these 716 ZIMs, despite the fact that it is "many ZIMs". But I'm not really bought by the idea.
WDYT?
The text was updated successfully, but these errors were encountered:
Now that devdocs scraper is ready, we need to create recipes for every ZIM we wanna create with this scraper.
To do this, we need:
On first point, I've started to create a generic tool (for developers) capable to create and maintain a set of Zimfarm recipes (Stackoverflow, TED, devdocs, libretexts, ...).
On second point, I've done a short analysis and I need help. Analysis data is here: https://docs.google.com/spreadsheets/d/1WYVUmYGHdTKKCuTpBXcoCe7XI7yfmefpTI-qpGGHWyI/edit?usp=sharing (mind the two tabs).
Initial idea was to create one ZIM per slug, also because it is what the scraper is capable of. This would give us for instance python 3.10, python 3.11, python 3.12, ... The fact is that there is 716 slugs, and I'm not totally convinced anymore that creating so many ZIM is really the good solution.
Another approach would be to create one ZIM per "Name", e.g. one for Python, one for Lua, ... with all versions inside. At least as an end-user, I can image one might prefer to have one ZIM for Python with all versions inside, so that we do not have to switch to another ZIM everytime we switch Python version. This would give us only 221 ZIMs.
But the fact is that the scraper is not (yet) capable to create these "mega-ZIM" (mega does not mean it is going to be long to create or consume lot of space, just that there is multiple things inside, and "über-ZIM" looks too German 🤣 ), and I'm pretty sure that it will make searching (via suggestion or full-text search) even harder because we will often have duplicates across versions.
I do not consider creating a ZIM only for most recent version (e.g. Python 3.13 only for Python), because it does not look very handy (e.g. I might still be forced to use Python 3.10 for whatever reason and need the doc for that version).
My recommendation so far would be to stick to the original idea to create these 716 ZIMs, despite the fact that it is "many ZIMs". But I'm not really bought by the idea.
WDYT?
The text was updated successfully, but these errors were encountered: