Skip to content

carlvellotti/substack-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Substack Downloader

A tool to archive Substack newsletters you are currently subscribed to. This allows you to keep an offline copy of the content you have paid for, forever.

Important

This is NOT a piracy tool.

  • It can only download content you usually have access to.
  • It does not bypass paywalls for newsletters you are not subscribed to.
  • Its primary use case is archiving your library before you unsubscribe.

Privacy & Security

  • 100% Local: Your cookies, session data, and downloaded articles are stored only on your computer. Nothing is ever sent to any external server.
  • Safe: Your credentials are used strictly to authenticate with Substack for downloading your own content.

Features

  • Personal Archive: Download all posts from a newsletter to your local machine.
  • Paid Content Support: Authenticates using your existing subscription to archive subscriber-only posts.
  • Custom Domain Support: Includes a login helper to bypass bot protection on custom domains (e.g., lennysnewsletter.com).
  • Offline Assets: Downloads images locally so you can view posts without an internet connection.
  • Markdown Support: Converts posts to Markdown (.md) with local image links, perfect for Obsidian or Notion.
  • Podcast Skipping: Option to skip podcast/audio episodes (--skip-podcasts).
  • HTML Export: Saves clean, readable HTML files.

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/substack-scraper.git
    cd substack-scraper
  2. Install dependencies:

    pip install -r requirements.txt
  3. Install Playwright browsers: (Required for the login helper)

    playwright install chromium

Authentication

Substack uses complex "bot protection" for some domains. This tool provides a Login Helper (login.py) to make authentication easy.

Method A: Standard Substacks (e.g., name.substack.com)

For most newsletters, you only need to log in once.

  1. Run in your terminal:
    python login.py
  2. A Chrome window will open. Log in to substack.com.
  3. Go back to the terminal and press Enter to save your session.
  4. This creates substack_session.json, which works for all standard Substack newsletters.

Method B: Custom Domains (e.g., robkhenderson.com, lennysnewsletter.com)

Newsletters with their own domains are isolated "islands" and require their own login.

  1. Run the helper with the URL (all on one line):
    python login.py https://www.lennysnewsletter.com
  2. A Chrome window will open. Log in to that specific site.
  3. Go back to the terminal and press Enter to save your session.
  4. This saves a domain-specific session (e.g., substack_session_www.lennysnewsletter.com.json) which the scraper will automatically detect and use.

Usage

Basic Scrape (HTML + Markdown + Images):

python scraper.py --url https://read.substack.com

Markdown Only (Best for Obsidian):

python scraper.py --url https://read.substack.com --md-only

Skip Podcasts:

python scraper.py --url https://newsletter.pragmaticengineer.com --skip-podcasts

Limit Number of Posts:

# Download only the 5 most recent posts
python scraper.py --url https://www.robkhenderson.com --limit 5

Output

Downloaded posts are saved in the archive/ directory, organized by domain:

archive/
├── read.substack.com/
│   ├── assets/
│   │   ├── image1.jpg
│   │   └── ...
│   ├── 2023-10-01_some-post-title.md
│   └── 2023-10-01_some-post-title.html
└── ...

Disclaimer

This tool is for personal archiving purposes only. Please respect the copyright of the authors and do not redistribute paid content.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages