-
Notifications
You must be signed in to change notification settings - Fork 1
IPWB-Compatible collection archiving task proof-of-concept #5
Comments
When generating WARCs using user-agents other than browsers, it's possible that the capture may not be comprehensive to the extent needed for accurate replay. For example, if I For the sake of a demo, it might be useful to first examine the potential for missed URIs when dereferencing them while creating the WARCs (the lib might need to handle this). |
Interesting. Would you recommend applying user-agent spoofing at all? I'm thinking of this approach. Either way, noted! Part of me thinks we should build / seek out some sort of "archiving obstacle course" to run tests against. If this doesn't already exist, it seems like it'd be worth having around for a number of different projects |
@b5 It's not necessarily the user-agent string but the capability of the agent. If the agent does not execute JS, some resource representations may not be surfaced and thus not archived by the tool. Awhile back I put together the Archival Acid Test (more info in the short paper) to evaluate existing crawlers/archival tools but that was a few years ago. Since then, I know the UK Web Archive started writing some evaluation mechanism and I believe @N0taN3rd is in the process of rewriting and extending my previous tests. |
@machawk1 @b5 Yes I am currently compiling a iframe madness is currently unarchivable (Internet Archive) for all non high-fidelity archives IPWB is high-fidelity 👍 |
Connecting @machawk1 & @oduwsdl: oduwsdl/ipwb#211
We should define a task that:
The text was updated successfully, but these errors were encountered: