Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce a scrape of a smallish Wiktionary with dev-1.14 for testing #2098

Open
Jaifroid opened this issue Nov 4, 2024 · 9 comments
Open

Comments

@Jaifroid
Copy link
Collaborator

Jaifroid commented Nov 4, 2024

In general, it would be a good idea to do some road-testing of Wikimedia ZIMs other than Wikipedia with the new API (assuming other types also use this), but also there are a few issues where longstanding problems may have been fixed by dev (or others potentially introduced). For example, #2073 and the very similar #1033.

A smallish one with full features might be wiktionary_es_all_max (latest version we have is https://library.kiwix.org/viewer#wiktionary_es_all_maxi_2024-06, was produced by 1.13). This is just 890MB, so seems good for testing.

@audiodude
Copy link
Member

Do we need to do this through zimfarm, or is it okay if I just do the scrape on my local machine and send you the ZIM?

@Jaifroid
Copy link
Collaborator Author

Jaifroid commented Nov 4, 2024

It depends whether it needs wider testing on other clients, I suppose. I only really test thoroughly on KJS Browser Extension and on Kiwix PWA. If I find an issue, I try to corroborate that the issue is also on other clients (Kiwix Desktop and Kiwix Android), or whether it's a problem with the JS client(s) only, so I do test a bit more widely, and then report if I find something significant that needs fixing at scraper level or in other clients.

In this case I'll be particularly keen to see whether #2073 is solved or not, and will report back on that.

So, whatever is easiest for you!

@audiodude
Copy link
Member

So I'm unable to produce a es.wiktionary.org ZIM right now because of #2003.

I'm getting:

[error] [2024-11-05T19:14:56.416Z] Cannot render [] into an article
[error] [2024-11-05T19:14:56.417Z] Error downloading article awalk
[error] [2024-11-05T19:14:56.417Z] Failed to run mwoffliner after [2611s]: {
	"name": "Error",
	"message": "Cannot render [] into an article"
}
[error] [2024-11-05T19:14:56.417Z] 

**********

Cannot render [] into an article

**********

@Jaifroid
Copy link
Collaborator Author

Jaifroid commented Nov 5, 2024

Thanks for trying @audiodude!

@kelson42
Copy link
Collaborator

kelson42 commented Dec 4, 2024

Tried again with wiktionary_es on Zimfarm and same result:, see https://farm.openzim.org/pipeline/b25161de-d764-4337-844e-19ec0a12afe4/debug

@Jaifroid
Copy link
Collaborator Author

Jaifroid commented Dec 4, 2024

Thanks @kelson42! Hmm, weird error. Who would know what it means?! Looks like some empty array was passed somewhere.

@audiodude
Copy link
Member

Thanks @kelson42! Hmm, weird error. Who would know what it means?! Looks like some empty array was passed somewhere.

As linked above, it's #2003. The wiki itself is returning an empty page in the API response.

@kelson42
Copy link
Collaborator

kelson42 commented Dec 5, 2024

I have made a test run with a "small" Wikipedia: wikipedia_ca.

Unfortunately it died pretty quickly, see https://farm.openzim.org/pipeline/d75b3ef3-3fec-4914-8bed-050be72960f7/debug. I suspect here another small bug.

@kelson42
Copy link
Collaborator

kelson42 commented Dec 9, 2024

I have made a test run with a "small" Wikipedia: wikipedia_ca.

Unfortunately it died pretty quickly, see https://farm.openzim.org/pipeline/d75b3ef3-3fec-4914-8bed-050be72960f7/debug. I suspect here another small bug.

I have open an issue for this last one #2113. I'm more concerned if Wikipedia is not scrap-able (for the moment) as Wiktionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants