Skip to content

Latest commit

 

History

History
94 lines (66 loc) · 2.4 KB

README_en.md

File metadata and controls

94 lines (66 loc) · 2.4 KB
jjwxc-logo

Scraping Books from the 晋江文学城 Website

Download non-V chapters of any book on https://www.jjwxc.net

language: python creator: chenxing release version last commit

简体中文 | English

Features:

  • CLI interface.
  • Output in .docx or .txt format.
  • Customizable output path.
  • ...................

If you have any suggestion or have found any bug, open an issue.

The CLI was built with Rich.

Preview:

Install and Usage

Download the Source Code

Click Code - Download ZIP to download the source code. Unzip it and rename it to jjwxc-crawler (recommended).

Install Requirements

  • Python 3.9.15
  • Windows

Assuming you now have Python installed, first, open a terminal in the root directory, that is, \jjwxc-crawler, and run the following commands to create and activate a virtual environment.

python -m venv venv
venv\Scripts\activate # on Windows

If you're on Linux,

chmod +x venv/bin/activate 
source venv/bin/activate 

Second, install Scrapy and other dependencies within the virtual environment, so make sure your venv virtual environment is activated.

pip install -r requirements.txt

Run the Console App

cd jjcrawler

# Download all non-V chapters of the novel with the specified id in the directory .\novels
scrapy crawl novel -a id=ID

# For example, like this
scrapy crawl novel -a id=1

Default output format is .docx

If you would like to download the chapters in .txt format, please edit \jjcrawler\jjcrawler\spiders\config.py

# docx | txt
format = "txt"

⬆ Back to Top