Skip to content

Commit 4189c0d

Browse files
committed
style: make the text more concise
1 parent d22855c commit 4189c0d

File tree

1 file changed

+26
-21
lines changed

1 file changed

+26
-21
lines changed

sources/academy/webscraping/scraping_basics_python/13_platform.md

Lines changed: 26 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,9 @@ That said, the main goal of this lesson is to show how deploying to **any platfo
3131

3232
## Registering
3333

34-
First, let's [create a new Apify account](https://console.apify.com/sign-up). The process includes several verifications that you're a human being and that your email address is valid. While annoying, these are necessary measures to prevent abuse of the platform.
34+
First, let's [create a new Apify account](https://console.apify.com/sign-up). You'll go through a few checks to confirm you're human and your email is validannoying but necessary to prevent abuse of the platform.
3535

36-
Apify serves both as an infrastructure where to privately deploy and run own scrapers, and as a marketplace, where anyone can offer their ready scrapers to others for rent. But we'll overcome our curiosity for now and leave exploring the Apify Store for later.
36+
Apify serves both as an infrastructure where to privately deploy and run own scrapers, and as a marketplace, where anyone can offer their ready scrapers to others for rent. But let's hold off on exploring the Apify Store for now.
3737

3838
## Getting access from the command line
3939

@@ -56,11 +56,11 @@ Success: You are logged in to Apify as user1234!
5656

5757
## Starting a real-world project
5858

59-
Until now, we've kept our scrapers minimal, each represented by just a single Python module, such as `main.py`. Also, we've been adding dependencies to our project only by installing them with `pip` inside an activated virtual environment.
59+
Until now, we've kept our scrapers simple, each with just a single Python module like `main.py`, and we've added dependencies only by installing them with `pip` inside a virtual environment.
6060

61-
If we were to send our code to a friend like this, they wouldn't know what they needed to install before running the scraper without import errors. The same applies if we were to deploy our code to a cloud platform.
61+
If we sent our code to a friend, they wouldn't know what to install to avoid import errors. The same goes for deploying to a cloud platform.
6262

63-
To share what we've built, we need a packaged Python project. The best way to do that is by following the official [Python Packaging User Guide](https://packaging.python.org/), but for the sake of this course, let's take a shortcut with the Apify CLI.
63+
To share our project, we need to package it. The best way is following the official [Python Packaging User Guide](https://packaging.python.org/), but for this course, we'll take a shortcut with the Apify CLI.
6464

6565
Change to a directory where you start new projects in your terminal. Then, run the following command—it will create a new subdirectory called `warehouse-watchdog` for the new project, containing all the necessary files:
6666

@@ -80,7 +80,7 @@ Inside the `warehouse-watchdog` directory, we should see a `src` subdirectory co
8080

8181
The file contains a single asynchronous function, `main()`. At the beginning, it handles [input](https://docs.apify.com/platform/actors/running/input-and-output#input), then passes that input to a small crawler built on top of the Crawlee framework.
8282

83-
Every program that runs on the Apify platform first needs to be packaged as a so-called Actor—a standardized container with designated places for input and output. Crawlee scrapers automatically connect their detault dataset to the Actor output, but input needs to be explicitly handled in the code.
83+
Every program that runs on the Apify platform first needs to be packaged as a so-called Actor—a standardized container with designated places for input and output. Crawlee scrapers automatically connect their default dataset to the Actor output, but input must be handled explicitly in the code.
8484

8585
We'll now adjust the template so it runs our program for watching prices. As a first step, we'll create a new empty file, `crawler.py`, inside the `warehouse-watchdog/src` directory. Then, we'll fill this file with the [final code](./12_framework.md#logging) from the previous lesson:
8686

@@ -141,9 +141,9 @@ Run: /Users/course/Projects/warehouse-watchdog/.venv/bin/python3 -m src
141141
...
142142
```
143143

144-
## Deploying to platform
144+
## Updating the Actor configuration
145145

146-
The Actor configuration from the template instructs the platform to expect input, so we should change that before running our scraper in the cloud.
146+
The Actor configuration from the template tells the platform to expect input, so we need to update that before running our scraper in the cloud.
147147

148148
Inside `warehouse-watchdog`, there's a directory called `.actor`. Within it, we'll edit the `input_schema.json` file, which looks like this by default:
149149

@@ -190,7 +190,9 @@ Make sure there's no trailing comma after `{}`, or the file won't be valid JSON.
190190

191191
:::
192192

193-
Now, we can proceed with deployment:
193+
## Deploying the scraper
194+
195+
Now we can proceed to deployment:
194196

195197
```text
196198
$ apify push
@@ -203,9 +205,9 @@ Actor build detail https://console.apify.com/actors/a123bCDefghiJkLMN#/builds/0.
203205
? Do you want to open the Actor detail in your browser? (Y/n)
204206
```
205207

206-
After agreeing to open the Actor details in our browser, assuming we're logged in, we'll see a **Start Actor** button. Clicking it takes us to a screen where we can specify Actor input and run options. Without changing anything, we'll continue by clicking **Start**, and we should immediately see the scraper's logssimilar to what we'd normally see in our terminal, but now running remotely on a cloud platform.
208+
After agreeing to open the Actor details in our browser, assuming we're logged in, we'll see an option to **Start Actor**. Clicking it opens the execution settings. We won’t change anything—just hit **Start**, and we should see logs similar to what we see locally, but this time our scraper is running in the cloud.
207209

208-
When the run finishes, the interface should turn green. On the **Output** tab, we can preview the scraper's results as a table or JSON. There's even an option to export the data to various formats, including CSV, XML, Excel, RSS, and more.
210+
When the run finishes, the interface will turn green. On the **Output** tab, we can preview the results as a table or JSON. We can even export the data to formats like CSV, XML, Excel, RSS, and more.
209211

210212
:::note Accessing data programmatically
211213

@@ -215,13 +217,13 @@ You don't need to click buttons to download the data. You can also retrieve it u
215217

216218
## Running the scraper periodically
217219

218-
Let's say we want our scraper to collect sale price data daily. In the Apify web interface, we'll go to [Schedules](https://console.apify.com/schedules). Clicking **Create new** will open a setup screen where we can specify the frequency (daily is the default) and select the Actors that should be started. Once we're done, we can click **Enable**—that's it!
220+
Now that our scraper is deployed, let's automate its execution. In the Apify web interface, we'll go to [Schedules](https://console.apify.com/schedules). Click **Create new**, review the periodicity (default: daily), and specify the Actor to run. Then click **Enable**—that's it!
219221

220-
From now on, the Actor will run daily, and we'll be able to inspect every execution. For each run, we'll have access to its logs and the collected data. We'll also see stats, monitoring charts, and have the option to set up alerts that notify us under specific conditions.
222+
From now on, the Actor will execute daily. We can inspect each run, view logs, check collected data, see stats, monitor charts, and even set up alerts.
221223

222224
## Adding support for proxies
223225

224-
If our monitoring shows that the scraper frequently fails to reach the Warehouse Shop website, we're most likely getting blocked. In that case, we can use proxies to make requests from different locations, reducing the chances of detection and blocking.
226+
If monitoring shows that our scraper frequently fails to reach the Warehouse Shop website, it's likely being blocked. To avoid this, we can configure proxies so our requests come from different locations, reducing the chances of detection and blocking.
225227

226228
Proxy configuration is a type of Actor input, so let's start by reintroducing the necessary code. We'll update `warehouse-watchdog/src/main.py` like this:
227229

@@ -289,7 +291,7 @@ Finally, we'll modify the Actor configuration in `warehouse-watchdog/src/.actor/
289291
}
290292
```
291293

292-
Now, if we run the scraper locally, everything should work without errors. We'll use the `apify run` command again, but this time with the `--purge` option to ensure we're not reusing data from a previous run:
294+
To verify everything works, we'll run the scraper locally. We'll use the `apify run` command again, but this time with the `--purge` option to ensure we're not reusing data from a previous run:
293295

294296
```text
295297
$ apify run --purge
@@ -319,7 +321,7 @@ Run: /Users/course/Projects/warehouse-watchdog/.venv/bin/python3 -m src
319321
...
320322
```
321323

322-
In the logs, we should see a line like `Using proxy: no`. When running the scraper locally, the Actor input doesn't include a proxy configuration, so all requests will be made from our own location, just as before. Now, let's update our cloud copy of the scraper with `apify push` to reflect our latest changes:
324+
In the logs, we should see `Using proxy: no`, because local runs don't include proxy settings. All requests will be made from our own location, just as before. Now, let's update the cloud version of our scraper with `apify push`:
323325

324326
```text
325327
$ apify push
@@ -331,7 +333,7 @@ Run: Building Actor warehouse-watchdog
331333
? Do you want to open the Actor detail in your browser? (Y/n)
332334
```
333335

334-
After opening the Actor detail in our browser, we should see the **Source** screen. We'll switch to the **Input** tab, where we can now see the **Proxy config** input option. By default, it's set to **Datacenter - Automatic**, and we'll leave it as is. Let's click **Start**! In the logs, we should see the following:
336+
Back in the Apify console, go to the **Source** screen and switch to the **Input** tab. You'll see the new **Proxy config** option, which defaults to **Datacenter - Automatic**. Leave it as is and click **Start**. This time, the logs should show `Using proxy: yes`, as the scraper uses proxies provided by the platform:
335337

336338
```text
337339
(timestamp) ACTOR: Pulling Docker image of build o6vHvr5KwA1sGNxP0 from repository.
@@ -361,13 +363,16 @@ After opening the Actor detail in our browser, we should see the **Source** scre
361363
...
362364
```
363365

364-
The logs should now include `Using proxy: yes`, confirming that the scraper is successfully using proxies provided by the Apify platform.
365-
366366
## Congratulations!
367367

368-
You've reached the end of the course—congratulations! 🎉
368+
You've reached the end of the course—congratulations! 🎉 Together, we've built a program that:
369369

370-
Together, we've built a program that crawls a shop, extracts product and pricing data, and exports the results. We've also simplified our work using a framework and deployed our scraper to a cloud platform, enabling it to run periodically, collect data over time, and provide monitoring and anti-scraping protection.
370+
- Crawls a shop and extracts product and pricing data
371+
- Exports the results in several formats
372+
- Uses a concise code, thanks to a scraping framework
373+
- Runs on a cloud platform with monitoring and alerts
374+
- Executes periodically without manual intervention, collecting data over time
375+
- Uses proxies to avoid being blocked
371376

372377
We hope this serves as a solid foundation for your next scraping project. Perhaps you'll even start publishing scrapers for others to use—for a fee? 😉
373378

0 commit comments

Comments
 (0)