Scraping Books from the 晋江文学城 Website

Download non-V chapters of any book on https://www.jjwxc.net

简体中文 | English

Features:

CLI interface.
Output in .docx or .txt format.
Customizable output path.
...................

If you have any suggestion or have found any bug, open an issue.

The CLI was built with Rich.

Preview:

Install and Usage

Download the Source Code

Click Code - Download ZIP to download the source code. Unzip it and rename it to jjwxc-crawler (recommended).

Install Requirements

Python 3.9.15
Windows

Assuming you now have Python installed, first, open a terminal in the root directory, that is, \jjwxc-crawler, and run the following commands to create and activate a virtual environment.

python -m venv venv
venv\Scripts\activate # on Windows

If you're on Linux,

chmod +x venv/bin/activate 
source venv/bin/activate

Second, install Scrapy and other dependencies within the virtual environment, so make sure your venv virtual environment is activated.

pip install -r requirements.txt

Run the Console App

cd jjcrawler

# Download all non-V chapters of the novel with the specified id in the directory .\novels
scrapy crawl novel -a id=ID

# For example, like this
scrapy crawl novel -a id=1

Default output format is .docx

If you would like to download the chapters in .txt format, please edit \jjcrawler\jjcrawler\spiders\config.py

# docx | txt
format = "txt"

⬆ Back to Top

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!