Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider more reliable and stable detection methods for distinguishing zimit classic and zimit2 #1203

Open
Jaifroid opened this issue Jan 25, 2024 · 5 comments
Labels
backend enhancement zimit Code relating to the support of Zimit-style archives
Milestone

Comments

@Jaifroid
Copy link
Member

As suggested here, we could look for 'warc2zim' AND 'zimit' strings in Scraper metadata (we currently only look for 'warc2zim', but it's not currently guaranteed to be stable), and if '_sw:yes' is not in tags, then it's zimit2. If it's there, then there is a Service Worker, meaning it's zimit classic.

We currently rely on finding 'warc-headers' in the declared MIME type. But it's possible (if currently unlikely) that such headers could be reintroduced if they are needed in future versions of zimit2, so it would be good to have other options as outlined above.

@Jaifroid Jaifroid added enhancement backend zimit Code relating to the support of Zimit-style archives labels Jan 25, 2024
@Jaifroid Jaifroid added this to the v4.2 milestone Jan 25, 2024
Jaifroid added a commit to kiwix/kiwix-js-pwa that referenced this issue Jan 25, 2024
Jaifroid added a commit that referenced this issue Jan 25, 2024
@Jaifroid
Copy link
Member Author

c528c94 addresses the first part of this issue (adds test for 'zimit' in the scraper name).

@kelson42
Copy link
Collaborator

kelson42 commented Jan 25, 2024

@Jaifroid The recommended way of doing it is to rely on _sw ZIM tag. Zimit2 should not need anything special at reader level AFAIK.
@benoit74 Wonder this not explicit in the documentation of warc2zim.

@Jaifroid
Copy link
Member Author

Thanks, @kelson42 I agree, I just can't use that method yet because all the zimit2 ZIMs produced so far have '_sw:yes'. Until that's fixed as requested by rgaudin, I have to use the current method.

There is a specific requirement in the reader to detect links and PDFs that cannot be opened in the webview or iframe due to sandboxing / CSP. Kiwix Serve has already been patched via libkiwix, and other readers that use libkiwix will have the patch. The issue is that Wombat aggressively rewrites such links, so they can't be detected without either temporarily disabling Wombat or using other workarounds. I've patched both KJS readers.

@benoit74
Copy link

Both changes have been done:

Not all tests ZIMs have been already rebuilt with this latest code change, but at least you have few to test.

@Jaifroid
Copy link
Member Author

@benoit74 Excellent, thanks!

@Jaifroid Jaifroid modified the milestones: v4.3, v4.4 Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend enhancement zimit Code relating to the support of Zimit-style archives
Projects
None yet
Development

No branches or pull requests

3 participants