-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(academy): add advanced crawling section with sitemaps and search #1217
base: master
Are you sure you want to change the base?
Conversation
Will process the lint issues soon |
@TC-MO If we change URL of an article, do I need to contact web team to set a hard redirect? |
I think we do redirects in nginx.conf file not sure if there is any other way |
TODO redirect |
sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
Will review, but I think I will wait for @TC-MO's comments to be addressed first. |
@TC-MO I addressed the remarks and rebased to master, pls take a look. I will see the tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 I think the front matter contains double semicolons, which could be a syntax error or a silent error causing unexpected problems.
🟠 I made a few suggestions on improvements. My aim was to add to readability and flow of the sentences.
- I steered the text towards being language-agnostic, but I didn't check the whole context of the course to see if it's necessary or not.
- I suggested to remove certain emotional phrases, such as "fortunately".
- Saying "next" together with "lessons" didn't feel right, so I suggest using "following lessons" and to keep "next" for just one lesson.
- You mention framework several times, and while it's perfectly valid, my idea would be to keep that word only for Crawlee (or Scrapy and others), and find other words to call the rest, such as algorithm, tool, or approach. I think separating those two clearly will remove some amount of possible confusion.
- When linking to content, I prefer to put actual text and names to the link text instead of "see this article", and I think it's also a web document accessibility rule.
⚪️ I could imagine a bit more polishing of the flow of the sentences, and sometimes they could be more brief, but I don't think that's something we should do in PR reviews, it's highly subjective and a lot of back and forth. For minimalistic Czenglish check I can highly recommend "Is this correct English?" prompt to ChatGPT, followed by a Markdown block of whatever you want to check. It keeps your writing as is, but provides basic grammar and flow fixes. The text then sounds like you, but is more smooth. I use it a lot.
sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/crawling-sitemaps.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/api_scraping/general_api_scraping/handling_pagination.md
Outdated
Show resolved
Hide resolved
Commit all suggestions from Honza Co-authored-by: Honza Javorek <[email protected]>
sources/academy/webscraping/advanced_web_scraping/crawling/crawling-with-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/advanced_web_scraping/crawling/sitemaps-vs-search.md
Outdated
Show resolved
Hide resolved
sources/academy/webscraping/api_scraping/general_api_scraping/handling_pagination.md
Outdated
Show resolved
Hide resolved
@TC-MO @honzajavorek Thanks for the suggestions, I hope I applied everything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.