Skip to content

feat: add lesson about using the platform #1424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3,062 commits into from
Closed

Conversation

honzajavorek
Copy link
Collaborator

@honzajavorek honzajavorek commented Jan 22, 2025

Introducing the final lesson of the course about deploying to the platform. This was quite challenging as with every other sentence I grappled with bugs or behavior, which wasn't really intuitive to me. On my journey I filed these:

I explored several approaches, which were dead ends. The lesson now takes an approach where it starts a new project from a template and replaces parts of the template with the original scraper. That completely avoids apify init and should be robust with regard to possible future changes, such as migrating to uv, and so on.

I find the UI of the Apify console rather confusing and super complex, especially navigation, even as a user who regularly visits the interface for the past year. Also the UI seems to remember my last location or something like that, so every time I open it, it defaults to a different tab. Once it's Input, other time it's Last run, etc.

I'm no UX designer, so I can't help with that, just sharing it here as a feedback and a fact, which I took into account when creating the lesson. The only way to mitigate the confusion which came to my mind was to provide as many screenshots as possible. Also I didn't dare to rely on where the student might land, and I make sure to re-iterate on which screen and in which tab they should be.

The lesson intentionally goes through updating the Actor so that the student knows how to do it and how to push new changes and build and run the Actor again and again. I opted to keep the student using the Input tab as the place from which they start the Actor, even though in reality they could press the Start button from other tabs too. I feel like that way it's less confusing, makes most sense, and they won't get distracted by all the other options that much.

I did my best to structure the lesson so that it leads from stating shortcomings of the current solution to understanding how the platform helps to solve them, because I think that's the most honest way to "sell" the platform.

Let me know what you think!

janbuchar and others added 30 commits October 30, 2024 17:16
We renamed the workflow in apify-sdk-python and apify-client-python.
Update LangChain integration to remove wrong links and information about
RAG-Web-Browser

@TC-MO please check English 🙏🏻

---------

Co-authored-by: Michał Olender <[email protected]>
This is a quite important part of sending Standby requests, and it was
missing.

I'm having a hard time coming up with some nice formatting, @TC-MO feel
free to restructure as you deem fit.

![Screenshot 2024-11-12 at 11 42
33](https://github.com/user-attachments/assets/5065abbc-a2d8-4187-8313-9ded8aa8f394)
Docs of new input schema property `resourceType`
Add new datepicker `dateType` property to input schema specification.
Co-authored-by: Jiri Spilka <[email protected]>
@honzajavorek honzajavorek force-pushed the honzajavorek/platform branch from a308047 to d44c772 Compare March 14, 2025 14:15
@honzajavorek honzajavorek marked this pull request as ready for review March 14, 2025 14:35
Copy link
Member

@metalwarrior665 metalwarrior665 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Was there a discussion about where this content should live before? There is quite a lot of duplication both with https://docs.apify.com/platform and https://docs.apify.com/academy/apify-platform. The approach is JS was to have the scraping tutorial separate from the platform.

I'm not against having the whole thing follow in the Python course (as it can specialize to Python devs needs) but then we will have to maintain duplicate content which tends to be a bit annoying.

@honzajavorek
Copy link
Collaborator Author

@metalwarrior665 The discussion has happened here: #1015 (comment) I don't want a duplicate content, but this is a logical ending of the course:

  1. basics in DevTools
  2. basics in Python
  3. use framework to simplify your code and get some other benefits (Crawlee)
  4. deploy to a platform to get some other benefits (Apify)

The lesson is specific to the scraper we're building over the course of the lessons. You could say the same about the previous lesson about Crawlee, where the same content could be covered by Crawlee docs.

Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we use webp instead of png? The images would be about 5x smaller. Changes to the Python code are okay.

@honzajavorek
Copy link
Collaborator Author

I don't mind using webp, but it's just about the size of this repo. The site has the images optimized automatically, at least that's what I remember @B4nan saying somewhere else in the comments.

@B4nan
Copy link
Member

B4nan commented Mar 18, 2025

The size of the repo is also important, if the difference is 5x let's just go with webp. We use them pretty much exclusively in the crawlee blog posts too for the same reason.

@honzajavorek
Copy link
Collaborator Author

honzajavorek commented Mar 18, 2025 via email

@honzajavorek
Copy link
Collaborator Author

I'm moving the discussion about images to a separate issue: #1549 Regarding this particular PR, the images here are technically already a part of this git repo now, so converting them would only add size, but if you want me to change them to webp, I'll do it.

@TC-MO
Copy link
Contributor

TC-MO commented Apr 28, 2025

I think that the structure that @honzajavorek proposed is the correct way of creating this course and integrating it with Academy. It aligns especially well with current work on trimming down Apify Platform content from Academy

Copy link
Contributor

@TC-MO TC-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions & changes suggested otherwise LGTM


## Registering

First, let's [create a new Apify account](https://console.apify.com/sign-up). You'll go through a few checks to confirm you're human and your email is valid—annoying but necessary to prevent abuse of the platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be opportunity to send to our docs? (though it should be trivial to sign-up so I'll just leave it up to your consideration) ¯\_(ツ)_/¯

Copy link
Collaborator Author

@honzajavorek honzajavorek Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, should be trivial, but it's not, I tried it in anonymous window.

With the extra sentence I wanted to provide some comfort to people who are following the lesson, but are not 100% sure they want to register to Apify. They may feel that the lesson manipulates them into registering with a random service, so they might already go into the process with a suspicion. And suddenly during the account creation, the service also wants their phone number. Especially in Czechia, this is a big ask and IMHO would result in people just closing the tab and giving up on the lesson.

I cannot change your login procedure and I kinda understand why it is like this, so I try to reduce the harm by comforting... or somehow preparing the person that they're about to go through this, it's expected, and explain why it's necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also intentionally don't mention the phone number, but dance around it. I don't even know if you require it always or just if the platform suspects a bot or something.

```text
$ apify login
...
Success: You are logged in to Apify as user1234!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually used `<YOUR_XYZ> as placeholders, not sure what is your experience with that. Is the mock username in that format better received by users?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I think here it doesn't matter, because the only purpose of the example is that compared to what the user sees in their terminal, it has similar sentences and shape. So if they glance at it, it'll appear similar and they'll assume "I'm on the right path".

I don't expect them to compare it letter by letter, or check whether the username checks out - I don't expect they juggle four Apify accounts in this case. In these cases I opt for having the variable parts as low-key sample values rather than well visible placeholders which stand out and feel important.

There is no science behind it though, and I have no strong opinion on this. It's just my freestyle and this is how I think about it. Feel free to guide me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a similar thing I did earlier with version numbers where they were not important, but I wanted the output to visually match what the user should get.

Screenshot 2025-04-29 at 9 26 07

@B4nan
Copy link
Member

B4nan commented Apr 28, 2025

the images here are technically already a part of this git repo now

They are part of your branch only, which will be wiped after we merge.

@honzajavorek
Copy link
Collaborator Author

They are part of your branch only, which will be wiped after we merge.

Thousand years ago I happened to randomly stumble upon two chapters (out of many) covering Git internals and something tells me it's not that simple. But that's just my hunch, I'm no Git expert. I consulted with LLM and they also say it's not this simple.

I'm still happy to provide webp images, if you don't mind the risk of added size.

@B4nan
Copy link
Member

B4nan commented Apr 29, 2025

Well, it is that simple, nobody is going to pull your branch, especially not given it will be wiped on merge. They will pull master, that's what we need to guard against getting fat.

edit: also if you would do this in a branch that would be long living, you could just fix the history and force push, no need to "surrender because its already there" :]

@honzajavorek
Copy link
Collaborator Author

This didn't go well. I rewrote history with git filter-repo, but it apparently worked with more commits than neccessary. I pushed the changes to this branch, GitHub immediately auto-closed the PR and now it doesn't allow me to do anything with it. I can force push to the branch and fix it, but UI of this PR seems to be stuck and disconnected. I'll have to create a new PR, hopefully with the same changes.

@honzajavorek
Copy link
Collaborator Author

New PR: #1556

honzajavorek added a commit that referenced this pull request Apr 29, 2025
I messed up #1424 trying to
remove PNG files from commit history. This is a new PR with (hopefully)
all the original commits correctly rewritten and cherry-picked.

---------

Co-authored-by: Michał Olender <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-academy Issues related to Web Scraping and Apify academies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.