Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

knowledge_graph.py: check outbound links #7288

Merged
merged 5 commits into from
Feb 19, 2024
Merged

knowledge_graph.py: check outbound links #7288

merged 5 commits into from
Feb 19, 2024

Conversation

m4rc1e
Copy link
Collaborator

@m4rc1e m4rc1e commented Feb 16, 2024

Some caveats I need to mention:

  • I'm only checking for 404 response codes. Many servers will send 403s, 406s etc when using Python requests but are ok when using a webbrowser. If we want to check all 400 codes, I'll need to add more urls to the whitelist.
  • We're checking ~500 urls so running the knowledge graph now takes 2 minutes. It used to take < 1 sec. If we want a speed up, we can async the code. Personally, 2 minutes seems fine since we rarely update knowledge items.
  • Since it's slow, it's off by default. Include cli arg --check_outbound_links in order for it to run.
  • SSL warnings do not fail the CI since these pages do exist.

Fixes #7277

@m4rc1e
Copy link
Collaborator Author

m4rc1e commented Feb 16, 2024

Results I get are:

INVALID url 'https://fonts.google.com/specimen/Source+Serif+Pro?sort=popularity&amp;category=Serif' returned response status code '404'
INVALID url 'https://fonts.google.com/specimen/Source+Serif+Pro?sort=date&amp;category=Serif' returned response status code '404'
INVALID url 'https://fonts.google.com/specimen/Source+Sans+Pro' returned response status code '404'
INVALID url 'https://fonts.google.com/specimen/Source+Serif+Pro' returned response status code '404'
INVALID url 'http://www.typografi.org/justering/gut_hz/gutenberghzenglish.html' returned response status code '404'
INVALID SSL 'https://www.very-able-fonts.com'
INVALID SSL 'https://kupferschrift.de/cms/2012/03/on-classifications/'

Copy link
Collaborator

@rsheeter rsheeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit's but generally LGTM

Edit; ofc please don't submit until CI passes, either by adding a known-failure list or by fixing bad urls.

@davelab6
Copy link
Member

by fixing bad urls.

All fixed in #7291

@m4rc1e
Copy link
Collaborator Author

m4rc1e commented Feb 19, 2024

Thanks both of you. Rod, I've implemented as much as your feedback as possible. Dave, thanks for fixing the links!

@m4rc1e m4rc1e merged commit 84e4801 into main Feb 19, 2024
9 checks passed
@m4rc1e m4rc1e deleted the knowledge-graph branch February 19, 2024 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI for GFK link breakage prevention
3 participants