Skip to content

Releases: DaRealFreak/epub-scraper

2.1.1

22 Sep 20:51
Compare
Choose a tag to compare

Changelog

2afacc7 [CLEANUP] remove toc content since the epub format already has an internal toc
bf813bf [TASK] extend log message to give info about which chapter assets are getting imported
d919ff4 [WIP] add example configuration for custom template and waybackmachine usage
ee04333 [TASK] rename cleanup regex to strip regex since the functionality of it is to strip the capture group from unwanted text, related to #13
01686ea [TASK] add option to cleanup extracted titles/chapter content with regular expressions, fixes #13
3e0e0db [TASK] split not existing redirect and blacklisted redirect URL for better clarity
602d88a [DOC] add cleanup regular expression to title-content and chapter-content options
a1d284e [BUGFIX] fix usage of cleanup options
e2407f8 [TASK] add cleanup regex for kanna no kanna
a3d343b [BUGFIX] change data type of redirects from json to yaml
a0c2a8e [TASK] implement configuration to use the wayback machine for hosts in case they got shut down, relates to #12
b611b61 [TASK] move wayback machine implementation to session package, use session as interface
eef22c2 [BUGFIX] fix faked original URL of request on using the wayback machine, resolves #12
16ff304 [TASK] make title selector optional in case there is actually no title, add prefix should be required if no title selector is configured, relates to #3
a3de07e [BUGFIX] fix application of cleanup options for title, extract method for application of cleanup options
43e385d [BUGFIX] prevent escaping the title which is HTML code
0fcb447 [TASK] update kanna no kanna for wayback machine and first chapters
86411ab [CLEANUP] remove log info for searching external asset URLs and just display the info on actually importing them
1e91be3 [DOC] add wayback machine options to site configuration
36a2bd8 [TASK] move sanitizing HTML code to the writer instead of in the extractor, sanitize every HTML code including titles
2d51209 [TASK] bump version to 2.1.1

2.1.0

19 Sep 01:47
Compare
Choose a tag to compare

Changelog

e16ad91 [TASK] add package to extract unicode emoji data from unicode.org, add functions to strip/replace emojis
878cb02 [BUGFIX] strip unicode emojis since they are not supported by most readers, breaking the generated epub
3fa1085 [BUGFIX] add whitelist for emojis since even normal numbers are listed in the emoji codes
4ff545b [TASK] import and replace images from img[src] and a[href] which match allowed mimetypes for epub files, resolves #9
1acb521 [CLEANUP] lint imports
641b62a [TASK] increase chapter index by 1 to match image prefix and actual chapter prefix
9c7a1c4 [DOC] update link to spec reference to official w3.org docs
54d778c [TASK] add chapter cleanup regular expression option, resolves #8
b32f9b3 [BUGFIX] fix used configuration in case we get redirected to another different page from within the redirects
803f706 [TASK] add chapter cleanup regular expressions in last boss real usage example configuration
4da29c0 [TASK] add debug message for resource imports
80e4b90 [TASK] update cleanup regular expression and suffix selector to not cut some chapters off in the middle
4b7f900 [TASK] add some special cases for navigation variations to the cleanup regular expression
cfd01d3 [TASK] add URL equality to check ignoring the URL scheme
128e5cc [TASK] add option to change the used templates with a new configuration key Templates, resolves #4
c9e9c2b [TASK] update templates to prefer single quotes
81b4834 [TASK] move AltTitle into ToC since it is only displayed there
9ee910f [DOC] add section about templates and their usable variables, add regular expression capture group requirements to README, resolves #11
cbebc29 [BUGFIX] fix tables by adding blank line above tables
7f85c3c [TASK] rename emoji package to unicode, add functionality to sanitize spaces since it killed the regular expressions multiple times
783eb0e [CLEANUP] move blacklisted URL info log message into check function
a58f37e [TASK] extend cleanup regular expression for special chapter case at new year
594e743 [TASK] add rate limiter for importing external assets into the epub to prevent connection refusals due to too many requests
a860d98 [TASK] remove visible \n characters from example descriptions
5bcc2ce [TASK] bump version to 2.1.0

2.0.3

16 Sep 02:07
Compare
Choose a tag to compare

Changelog

f2139e3 [TASK] skip files not ending on .yaml, fixes #6
d9d19b7 [TASK] use inline parsing for source content for sites too
78f787a [BUGFIX] use site configuration in case of cross site scraping, add not nil checks for optional selectors
0b32651 [TASK] update last boss example configuration for multiple hosts
65add8f [TASK] add BlackList to NovelConfig struct and add function to check if passed URL is blacklisted
b319e61 [TASK] add blacklist configuration option to avoid duplicates, fixes #10
c16fa42 [TASK] add blacklisted duplicated chapters
a0974dd [DOC] add section about blacklisted URLs
211b453 [TASK] bump version to 2.0.3

2.0.2

15 Sep 22:05
Compare
Choose a tag to compare

Changelog

996c861 [DOC] mention examples folder for example configurations
4cd8d52 [DOC] fix formatting of configuration options
275a2ed [DOC] fix identation of title/chapter content and pagination
5a0c1a3 [TASK] resolve relative path in pagination URLs
d8f77a0 [TASK] add example using novelupdates as ToC
15e4eba [TASK] move function to retrieve site configuration from parser to novelconfig, add redirect to site configuration, make chapter selector single string instead of string slice
24f69ea [TASK] update example configurations for new redirect configuration
227e81a [TASK] update redirection logic to allow cross site redirects, fixes #5
7e47f1a [TASK] extract source content struct to use chapter and toc sources together
2a04fee [TASK] extract chapter data directly instead of only the chapter URL which we opened later again, also fixes #7
991db20 [CLEANUP] instead of writing info on final write process we write the info on extracting the chapter
043a5c2 [TASK] trim spaces for chapter titles
e4a674c [TASK] add info log message about current URL we are extracting the chapter from
037f6fc [CLEANUP] remove argument print
41474cb [TASK] add multiple log info messages for the meta data of the generated epub
30ebd00 [TASK] use absolute file path for writing the epub file
e406f93 [BUGFIX] use inline notation similar to json for composite key SourceContent
e956a75 [TASK] change chapter-selectors to single string
1babad1 [DOC] update chapter-selectors to chapter-selector, add new configuration of redirects to site configuration options
96fc406 [TASK] bump version to 2.0.2

2.0.0

12 Sep 23:03
Compare
Choose a tag to compare

Changelog

e1cc758 [TASK] update example ToC project to new structure, add additional example configuration using chapter lists
853242f [WIP] update YAML configuration to allow following redirects, add pagination options and mixing up chapters and toc elements
ca78173 [TASK] rename DNS to Host, implement merge of source configuration and site configuration
1f62378 [TASK] add nosec comment for ebook-polish command
6fc6de5 [WIP] update code to new config structure
46168be [TASK] use struct to capsule config functions
a6745e5 [TASK] add pseudo code for handling ToCs and Chapters
8be73e6 [TASK] implement recursive chapter selection from possible redirects
41a6271 [TASK] update chapter selector and pagination for rtd.moe ToC, add chapter 17 manually since not listed in ToC
4e493ef [TASK] implement pagination of ToC, respect ReversePosts, general clean up
979098d [BUGFIX] use OuterHtml function to allow non container HTML elements, only cut off from the first match instead of every match
7521fe7 [BUGFIX] use OuterHtml function for footer block too
f72883c [TASK] add and write chapters based on the new configuration structure
00a6893 [TASK] update example configuration for OP waifus to match real site structure and generate a valid epub
118fac6 [TASK] add description and language meta data of epub to configuration
66a1e3e [TASK] add custom section for title selection to allow cleanup and trimming suffix/prefixes too
205ac59 [TASK] update example configuration for cleanup-regex and title selectors
6ce2937 [TASK] use value instead of pointer for cleanup regex since newly initialized string equals to empty string anyways
02ae040 [TASK] implement cleanup of title
04b7b7c [TASK] fix link to translator, change order to match chapters
d809fea [BUGFIX] generate site config in case no matching site config with the passed host is found to still initialize values
6d96c27 [TASK] add description and language to general, set title-content values and rename author/footer settings
e7a732f [TASK] narrow down prefix and suffix selectors
30d98ba [TASK] pin variable for scope linting
32ab063 [TASK] directly parse chapters on navigating through the ToC pages, remove handleChapter function and directly call the extractChapterData function
c9d2541 [TASK] add info log about added chapters, update doc blocks for functions
0a54db4 [TASK] bump version to 2.0.0

1.0.2

09 Sep 10:17
Compare
Choose a tag to compare

Changelog

0adaf5f [TASK] add function to use calibres ebook-polish to fix encoding errors and compress images, etc.
19a8b91 [DOC] add README file
b70e82f [TASK] bump version to 1.0.2

1.0.1

09 Sep 02:06
Compare
Choose a tag to compare

Changelog

40d6e44 [TASK] update author note end selector for example config
6362862 [BUGFIX] use the outer HTML as separator instead of the inner HTML which may be empty
c53c90b [TASK] fix broken HTML code using net/html rendering and sanitize the generated HTML using bluemondays UGCPolicy
8faa964 [TASK] don't assign ToC content center since it looks bad in mobile