Skip to content

Commit 3a4fae0

Browse files
committed
style: English
1 parent 32953f3 commit 3a4fae0

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

sources/academy/webscraping/scraping_basics_python/12_framework.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -408,7 +408,7 @@ In the next lesson, we'll use a scraping platform to set up our application to r
408408

409409
### Build a Crawlee scraper of F1 Academy drivers
410410

411-
Scrape information about all [F1 Academy](https://en.wikipedia.org/wiki/F1_Academy) drivers listed on the official [Drivers](https://www.f1academy.com/Racing-Series/Drivers) page. Each item you push to the Crawlee's default dataset should contain the following data:
411+
Scrape information about all [F1 Academy](https://en.wikipedia.org/wiki/F1_Academy) drivers listed on the official [Drivers](https://www.f1academy.com/Racing-Series/Drivers) page. Each item you push to Crawlee's default dataset should include the following data:
412412

413413
- URL of the driver's f1academy.com page
414414
- Name
@@ -417,7 +417,7 @@ Scrape information about all [F1 Academy](https://en.wikipedia.org/wiki/F1_Acade
417417
- Date of birth (as a `date()` object)
418418
- Instagram URL
419419

420-
If you export the dataset as a JSON, you should see something like this:
420+
If you export the dataset as JSON, it should look something like this:
421421

422422
<!-- eslint-skip -->
423423
```json
@@ -444,8 +444,8 @@ If you export the dataset as a JSON, you should see something like this:
444444

445445
Hints:
446446

447-
- Use Python's native `datetime.strptime(text, "%d/%m/%Y").date()` to parse the `DD/MM/YYYY` date format. See [docs](https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime) to learn more.
448-
- Use the attribute selector `a[href*='instagram']` to locate the Instagram URL. See [docs](https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors) to learn more.
447+
- Use Python's `datetime.strptime(text, "%d/%m/%Y").date()` to parse dates in the `DD/MM/YYYY` format. Check out the [docs](https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime) for more details.
448+
- To locate the Instagram URL, use the attribute selector `a[href*='instagram']`. Learn more about attribute selectors in the [MDN docs](https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors).
449449

450450
<details>
451451
<summary>Solution</summary>
@@ -495,15 +495,15 @@ Hints:
495495

496496
</details>
497497

498-
### Use Crawlee to find rating of the most popular Netflix films
498+
### Use Crawlee to find the ratings of the most popular Netflix films
499499

500-
The [Global Top 10](https://www.netflix.com/tudum/top10) page contains a table of the most currently popular Netflix films worldwide. Scrape the movie names, then search for each movie at the [IMDb](https://www.imdb.com/). Assume the first search result is correct and find out what's the film's rating. Each item you push to the Crawlee's default dataset should contain the following data:
500+
The [Global Top 10](https://www.netflix.com/tudum/top10) page has a table listing the most popular Netflix films worldwide. Scrape the movie names from this page, then search for each movie on [IMDb](https://www.imdb.com/). Assume the first search result is correct and retrieve the film's rating. Each item you push to Crawlee's default dataset should include the following data:
501501

502502
- URL of the film's imdb.com page
503503
- Title
504504
- Rating
505505

506-
If you export the dataset as a JSON, you should see something like this:
506+
If you export the dataset as JSON, it should look something like this:
507507

508508
<!-- eslint-skip -->
509509
```json
@@ -522,7 +522,7 @@ If you export the dataset as a JSON, you should see something like this:
522522
]
523523
```
524524

525-
For each name from the Global Top 10, you'll need to construct a `Request` object with IMDb search URL. Take the following code snippet as a hint on how to do it:
525+
To scrape IMDb data, you'll need to construct a `Request` object with the appropriate search URL for each movie title. The following code snippet gives you an idea of how to do this:
526526

527527
```py
528528
...
@@ -544,7 +544,7 @@ async def main():
544544
...
545545
```
546546

547-
When following the first search result, you may find handy to know that `context.enqueue_links()` takes a `limit` keyword argument, where you can specify the max number of HTTP requests to enqueue.
547+
When navigating to the first search result, you might find it helpful to know that `context.enqueue_links()` accepts a `limit` keyword argument, letting you specify the max number of HTTP requests to enqueue.
548548

549549
<details>
550550
<summary>Solution</summary>

0 commit comments

Comments
 (0)