A tool to archive Substack newsletters you are currently subscribed to. This allows you to keep an offline copy of the content you have paid for, forever.
Important
This is NOT a piracy tool.
- It can only download content you usually have access to.
- It does not bypass paywalls for newsletters you are not subscribed to.
- Its primary use case is archiving your library before you unsubscribe.
- 100% Local: Your cookies, session data, and downloaded articles are stored only on your computer. Nothing is ever sent to any external server.
- Safe: Your credentials are used strictly to authenticate with Substack for downloading your own content.
- Personal Archive: Download all posts from a newsletter to your local machine.
- Paid Content Support: Authenticates using your existing subscription to archive subscriber-only posts.
- Custom Domain Support: Includes a login helper to bypass bot protection on custom domains (e.g.,
lennysnewsletter.com). - Offline Assets: Downloads images locally so you can view posts without an internet connection.
- Markdown Support: Converts posts to Markdown (
.md) with local image links, perfect for Obsidian or Notion. - Podcast Skipping: Option to skip podcast/audio episodes (
--skip-podcasts). - HTML Export: Saves clean, readable HTML files.
-
Clone the repository:
git clone https://github.com/yourusername/substack-scraper.git cd substack-scraper -
Install dependencies:
pip install -r requirements.txt
-
Install Playwright browsers: (Required for the login helper)
playwright install chromium
Substack uses complex "bot protection" for some domains. This tool provides a Login Helper (login.py) to make authentication easy.
For most newsletters, you only need to log in once.
- Run in your terminal:
python login.py
- A Chrome window will open. Log in to
substack.com. - Go back to the terminal and press Enter to save your session.
- This creates
substack_session.json, which works for all standard Substack newsletters.
Newsletters with their own domains are isolated "islands" and require their own login.
- Run the helper with the URL (all on one line):
python login.py https://www.lennysnewsletter.com
- A Chrome window will open. Log in to that specific site.
- Go back to the terminal and press Enter to save your session.
- This saves a domain-specific session (e.g.,
substack_session_www.lennysnewsletter.com.json) which the scraper will automatically detect and use.
Basic Scrape (HTML + Markdown + Images):
python scraper.py --url https://read.substack.comMarkdown Only (Best for Obsidian):
python scraper.py --url https://read.substack.com --md-onlySkip Podcasts:
python scraper.py --url https://newsletter.pragmaticengineer.com --skip-podcastsLimit Number of Posts:
# Download only the 5 most recent posts
python scraper.py --url https://www.robkhenderson.com --limit 5Downloaded posts are saved in the archive/ directory, organized by domain:
archive/
├── read.substack.com/
│ ├── assets/
│ │ ├── image1.jpg
│ │ └── ...
│ ├── 2023-10-01_some-post-title.md
│ └── 2023-10-01_some-post-title.html
└── ...
This tool is for personal archiving purposes only. Please respect the copyright of the authors and do not redistribute paid content.