Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit handling of can_finish and high level exceptions #129

Open
benoit74 opened this issue Jan 10, 2025 · 0 comments
Open

Revisit handling of can_finish and high level exceptions #129

benoit74 opened this issue Jan 10, 2025 · 0 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@benoit74
Copy link
Contributor

We have a very weird end of scrape at https://farm.openzim.org/pipeline/0736eb17-c4a6-4065-9774-e155c81300f5:

[mindtouch2zim::Thread-4 (worker)::2025-01-10 01:17:12,990] DEBUG:Fetched directly from S3 cache
[mindtouch2zim::Thread-4 (worker)::2025-01-10 01:17:12,990] DEBUG:Adding asset to bio.libretexts.org/@api/deki/files/10689/Figure_11_01_03.jpg?revision=1&size=bestfit&width=910&height=803 in the ZIM
[mindtouch2zim::Thread-5 (worker)::2025-01-10 01:17:13,041] DEBUG:Fetching from online
[mindtouch2zim::Thread-3 (worker)::2025-01-10 01:17:13,056] DEBUG:Fetched directly from S3 cache
[mindtouch2zim::Thread-3 (worker)::2025-01-10 01:17:13,056] DEBUG:Adding asset to bio.libretexts.org/@api/deki/files/11609/base_pairing_labeled.png?revision=2&size=bestfit&width=741&height=347 in the ZIM
[mindtouch2zim::Thread-10 (worker)::2025-01-10 01:17:13,057] DEBUG:Fetched directly from S3 cache
[mindtouch2zim::Thread-10 (worker)::2025-01-10 01:17:13,057] DEBUG:Adding asset to bio.libretexts.org/@api/deki/files/10691/Figure_11_01_05.jpg?revision=1&size=bestfit&width=1030&height=1445 in the ZIM
[mindtouch2zim::Thread-6 (worker)::2025-01-10 01:17:13,070] DEBUG:Fetched directly from S3 cache
[mindtouch2zim::Thread-6 (worker)::2025-01-10 01:17:13,070] DEBUG:Adding asset to bio.libretexts.org/@api/deki/files/10692/Figure_11_01_06.jpg?revision=1&size=bestfit&width=1021&height=942 in the ZIM
[mindtouch2zim::Thread-5 (worker)::2025-01-10 01:17:13,173] DEBUG:Optimizing
[mindtouch2zim::Thread-5 (worker)::2025-01-10 01:17:13,194] DEBUG:Uploading to S3
[mindtouch2zim::Thread-5 (worker)::2025-01-10 01:17:13,359] DEBUG:Adding asset to bio.libretexts.org/@api/deki/files/21513/mindtouch.page%23thumbnail?revision=1 in the ZIM
[mindtouch2zim::Thread-2 (worker)::2025-01-10 01:17:24,368] DEBUG:Request error, starting backoff of 12.4 seconds after 4 tries
[mindtouch2zim::Thread-7 (worker)::2025-01-10 01:17:28,196] DEBUG:Request error, starting backoff of 8.6 seconds after 4 tries
[mindtouch2zim::Thread-2 (worker)::2025-01-10 01:17:36,761] WARNING:Exception while processing asset from https://search.openverse.engineering/static/img/cc_icon.svg?media_id=ac219762-a26d-45fd-823d-4ff90c5f3706 used by page ID 84593 (https://bio.libretexts.org/Sandboxes/tholmberg_at_nwcc.edu/Introduction_to_Environmental_Science/11%3A_Conventional_and_Sustainable_Energy/10.2%3A_Forms_of_Energy): HTTPSConnectionPool(host='search.openverse.engineering', port=443): Max retries exceeded with url: /static/img/cc_icon.svg?media_id=ac219762-a26d-45fd-823d-4ff90c5f3706 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f4128a51490>: Failed to resolve 'search.openverse.engineering' ([Errno -5] No address associated with hostname)"))
[mindtouch2zim::MainThread::2025-01-10 01:17:36,766] INFO:  Progress 74646 / 74648
[mindtouch2zim::Thread-7 (worker)::2025-01-10 01:17:36,801] WARNING:Exception while processing asset from https://search.openverse.engineering/static/img/cc-by_icon.svg used by page ID 84593 (https://bio.libretexts.org/Sandboxes/tholmberg_at_nwcc.edu/Introduction_to_Environmental_Science/11%3A_Conventional_and_Sustainable_Energy/10.2%3A_Forms_of_Energy): HTTPSConnectionPool(host='search.openverse.engineering', port=443): Max retries exceeded with url: /static/img/cc-by_icon.svg (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7f411dd181a0>: Failed to resolve 'search.openverse.engineering' ([Errno -5] No address associated with hostname)"))
[mindtouch2zim::MainThread::2025-01-10 01:17:36,809] WARNING:1413 bad assets have been ignored
[mindtouch2zim::MainThread::2025-01-10 01:17:36,819] ERROR:ZIM creation failed
[mindtouch2zim::MainThread::2025-01-10 01:17:36,819] INFO:  Progress 74648 / 74648

Looking at the code, I have no clue how this could happen (and exception should have be re-raised since we manipulate can_finish at only one place). And no clue how I intended this to work (if creator.can_finish is never supposed to be false ... glad I placed this code however ...).

Anyway, there is something to fix here.

Note that there is only one log with ERROR level in the whole task.

@benoit74 benoit74 added the bug Something isn't working label Jan 10, 2025
@benoit74 benoit74 added this to the 0.2.0 milestone Jan 10, 2025
@benoit74 benoit74 self-assigned this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant