Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Medium article scrapper #227

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions Python/Medium-Article-Scraper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Medium Article Scraper

### Getting started:
- Before getting started you will have to download the required modules:
- You can do this by doing `pip3 install -r requirements.txt` or `pip install -r requirements.txt`
- You would also have to download wkhtmltopdf on your os:
- You can do this by:
- MacOS: `brew install Caskroom/cask/wkhtmltopdf`
- Debian/Ubuntu: `apt-get install wkhtmltopdf`
- Windows: `choco install wkhtmltopdf`
- Or you can download it from here: https://wkhtmltopdf.org/

- After executing the script it will ask for article url as an input.

- The output would be save as 'out.pdf'
2 changes: 2 additions & 0 deletions Python/Medium-Article-Scraper/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
import pdfkit
pdfkit.from_url(input('Input article url: '), 'out.pdf')
1 change: 1 addition & 0 deletions Python/Medium-Article-Scraper/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pdfkit==0.6.1
2 changes: 2 additions & 0 deletions Python/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

| Sno. | Name | Description | Author |
|------|----------------------------|------------------------------------------------------------------------------------------|------------------|
| 1 | [JPG to PNG](/Python/Convert_jpg_to_png) | Convert the image from jpg to png | BatoolMM |
@@ -52,4 +53,5 @@
| 51 | [Related Hashtags](/Python/Related%20Hashtags) | Get popular hashtags from a particular topic | Riken-Shah |
| 52 | [Download Audio from Youtube](/Python/Download_Audio_From_Video) | Download audio file from youtube | gudeliauskaspam |
| 53 | [Reddit Scraper](/Python/Reddit_Scraper) | scrape the subreddits for some or all the post | nishan7 |
| 54 | [Website-cloner](/Python/website-cloner) | Downloads all the files used in the website. | Neel Patel |

21 changes: 21 additions & 0 deletions Python/website-cloner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

# Website Cloner

### Getting started

- This script will download all the files used in this website.

- You would have to download the modules required for the script to run by:
`pip install -r requirements.txt` or `pip3 install -r requirements.txt`

- The program will stop once it downloads all the files, you can then quit the program by `ctrl + c`
- The files would be saved in a folder named site cloner, in the directory specified.

### After execution:
- The program will ask for the following:
- The url of the website.
- The directory path where the files needed to be saved.

#### The program will save the files like this:

![Images]('')
Binary file added Python/website-cloner/img.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions Python/website-cloner/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from pywebcopy import save_webpage

kwargs = {'project_name': 'site folder'}

save_webpage(

# url pf the website
url=input("Paste the url here: "),
# folder where the copy will be saved
project_folder=input('Enter project path: '),
**kwargs
)
1 change: 1 addition & 0 deletions Python/website-cloner/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pywebcopy==6.3.0