Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to implement HTML redirect processing #3

Open
AMDmi3 opened this issue Dec 22, 2017 · 1 comment
Open

Try to implement HTML redirect processing #3

AMDmi3 opened this issue Dec 22, 2017 · 1 comment

Comments

@AMDmi3
Copy link
Member

AMDmi3 commented Dec 22, 2017

For instance

% curl http://secureideas.sourceforge.net/
<html><head><meta HTTP-EQUIV="REFRESH" content="0; url=http://base.secureideas.net"></head></html>

Link checker sees this as 200, but in reality the link is dead. Likely we may download reply bodies if their size (after HEAD request) is less than some sane minimum, parse and check for HTML redirects. In this case, though, HEAD reply does not contain Content-length.

PS. It turns out that Content-length header in response got from python requests often contains bogus 20 value. This should be fixed as well.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Oct 13, 2018

Partial but bulletproof solution would be to store flag for each link whether it's homepage or download. Downloads which reply with text/html content-type can be treated as erroneous.

@AMDmi3 AMDmi3 transferred this issue from repology/repology-updater Mar 30, 2019
@AMDmi3 AMDmi3 changed the title Deal with html redirects somehow Try to implement HTML redirect processing Mar 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant