Partial implementation of fix for issue 1395.#2206
Partial implementation of fix for issue 1395.#2206fatotak wants to merge 5 commits intomihonapp:mainfrom
Conversation
Fixed: new downloads. Working: reading. Todo: migrate old chapters to new name + delete and force redownload of problem chapters (chapters within a manga where name and scanlator are the same value).
| val newChapterName = sanitizeChapterName(chapterName) | ||
| fun getChapterDirName(chapter: Chapter): String { | ||
| val newChapterName = sanitizeChapterName(chapter.name) | ||
| val hash = md5(chapter.url).takeLast(6) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
|
What if the URL changes? A lot of sources change the url periodically and this would mean downloads from any of those sources won't stay downloaded. |
I used url for 2 reasons:
Given it's in the base build, not extension, there's 2 fields we can use for this change:
no other field provides the level of uniqueness required. |
|
chapter.id also changes when the url changes, there is no way to provide the level of uniqueness you want unless its done on the extension side. |
|
|
Then maybe a pre-processor should be added on Mihon's side before chapters are added to the database that adds an Alt1/Alt2/Alt3/etc to the scanlator field if it finds duplicates, based on the order they are in the list given by extensions. Changing the downloads scheme will cause more problems then not. |
And how will the pre-processor determine it's alt vs non-alt?
What exact problems? |
As I said in my previous message, |
Basically, a list of "Chapter 1", "Bonus", "Bonus", "Chapter 2", etc is handed over by the extension, and you are suggesting to name them "Chapter 1.cbz", "Bonus (1).cbz", "Bonus (2).cbz", "Chapter 2.cbz" in the exact order given by the source. |
that order is already in the db: The source can, at any time, insert a chapter anywhere. In the filesystem, we have Bonus = 5.5, Bonus (2) = 9.5, and.... Bonus (3) = 9.5, with NO 4.5. Using the URL, we won't have such issues. I think it is far less reliable than using url to match. |
As Syer10 said before, there are plenty of sources who (semi-)frequently change chapter URLs, which makes the URL unreliable as well. |
Does upload date change when they change url? |
|
Your example uses the same upload date for both chapters |
I'm not sure but I believe some sources don't have an upload date in which case the extension falls back to (today) or something. |
I know that... but I'm looking to determine some kind of pattern, we have no single element right now - url, timestamp, id, etc, but if we can leverage the db and do 2 x lookup instead of 1, we can still achieve Actually, my example, if we can just store the filename in the db, then it doesn't matter if they change url later. It's the same chapter anyway, different quality, so what's important is to keep both files and have each row in the ui point to a different file. BUT the example in the ticket is the one that forces us to "get it right", because they're actually totally different chapters suffering the same symptoms. |
|
Per @AntsyLich in #1395 (comment), using the URL should be fine, given that there's already data fixing logic for the specific sources that have unstable URLs. |
While that is true, having to rename each and every folder of all downloaded chapters for 1 source will be a huge amount of filesystem operations, I don't think its viable to trust the device to do it all in good time. |
|
It's possible to have the reader check for both old-style and new-style downloads. This is in accordance with the reader logic Mihon already has to check for four(!) different formats of downloaded chapter, due in part to past migrations. Renaming everything therefore isn't something that has to be done as a synchronous migration, and even further, we don't have to enable it for every source. In your alternative proposal, can you elaborate on how you would plan to handle cases where chapters are added or removed in the middle of a series? At first glance, it seems like this would catastrophically break the database correctness since already-downloaded-chapters would be renumbered and thus interchanged with each other - an even worse outcome than chapters needing to be redownloaded. |
|
I think you misunderstood what I was saying. I know my solution is not perfect, but I feel it's better then this since this comes with a lot of caveats. It also must be said that the issue this is fixing is so small since almost no sources have duplicate Chapter names and scanlators, It's only a small few that have this issue. |
This code doesn't proactively rename old files (as it can't, due to not being able to identify the old problem chapters). It only:
|
Nope. a file move isn't a huge amount of file system ops. Mihon has a bigger issue with file system iops (download to temp -> zip -> delete temp). |
|
How would you feel about having the hash-based chapter names be a flag that can be enabled or disabled by a particular source? The default can be the present behavior of not disambiguating chapters with the same names, and sources that have duplicate chapters can set the flag to append a hash based on the URL. I think this would address everyone's concerns, yes? No need to worry about sources whose URLs are unstable, since we would not need to enable the flag for those, and no need to worry about a perception of how "small" the problem is, because the solution can be applied only to the sources that are problematic, without affecting the others. IMO, we would also want to add a check to the general downloader code that would at least detect when there has been a filename collision, and report it as an error to the user, so that they can know to go to the issue tracker and report that a source probably should have this flag enabled for it, to enable correct downloads. |
How will the source know? It's not like they can predict the future. Rather than that, an indicator on the source about the stableness of the url may be viable... (static/hard coded value in the extension)?
Main issue with that is that the interface between mihon and extension may require modification. |
I was intending to introduce unavailable chapter to counter chapter removal. Which should also affect this as we won't need to migrate downloads to another readded chapter. |
If you do this, does it mean |
I suppose yeah |
…able, and equally unique.
Updated. |
|
This means that we won't be able to move the downloads to a new Mihon install though |
Doesn't export/import transfer the db? |
That won't work if the other db already has a chapter with id 1 tho |
Can we add the last known file name to the db? Dedicated db field + deterministic suffix should be good for all cases, after your fix for chapter shift. |
…d is more appropriate for new file naming format.
|
any progress on the review/merge? |
|
@AntsyLich or @MajorTanya, would one of you perhaps be able to comment on what is needed from contributors to move towards a merged fix for this bug? I'm happy to help out. |
|
I was on hiatus but other than that i did leave some comments in the discord server tldr is that I don't like the extra db io this changes result in |
|
Ah, sorry, I was unaware that discussion was happening there. For future reference, the mentioned discussion can be viewed at https://discord.com/channels/1195734228319617024/1197870876557848647/1391637343038996482 and subsequent messages. |
No need to be sorry |
|
@fatotak and @AntsyLich - I've filed fatotak#4 against this PR. The additional changes:
The present code should be backwards compatible with existing downloaded chapters, but use the new scheme to download new chapters. As mentioned in the original PR description, we could also take action to correct existing, already-downloaded chapters that were affected by this bug and therefore contain incorrect contents. This is not yet implemented. I tested this by compiling a development build of Mihon with the changes (as well as a custom Hiveworks extension to work around keiyoushi/extensions-source#9648) and downloading some chapters that have the same title. They render correctly and distinctly, unlike before. This is what the filesystem looks like: (There are two chapters called |
|
@raxod502 awesome. From a quick glance I'm ok with the implementation but if possible we should take the opportunity in changing download name schema to address some other related issues (e.g. #1517, #1633, #2280 and potentially #2132 though I'm not married to the idea). The general plan would be to:
|
|
I wish this was followed by a change to Local Source to use the included metadata on chapters then to avoid both clipped titles and the hash suffix on the chapter list :nyoron: |
|
I didn't know about local sources! Reading the documentation for that, it looks like we should continue to support the non-hash-appended filename structure as something Mihon is able to read, because it wouldn't make sense to ask people to compute hashes when providing their own files. The character set and length restrictions can be applied on download, but we can still allow Mihon to read a more general set of file structures for the local source. I will add the suggested changes to fatotak#4 if @fatotak doesn't get to it first. |
|
I think you misunderstood, the effect on Local Source would be the Chapter Names when transplanting chapters downloaded from extensions. It wouldn't need to recalculate the hash. In fact it couldn't since the chapter URL is the chapter's file/folder name itself. |
I think the length cap is already in place: |
|
So the use case you're referring to is when downloading manga using one source and then... viewing it again via local source, instead of using the original source? |
|
Correct. |
…ters_with_same_name Improvements to PR#2206
|
Friends - I have filed fatotak#5 against this PR. It incorporates two substantive changes: 1. Correct filename truncationExisting logic in Leave the old implementation of As was pointed out by Note that #2280 shows the exact truncation behavior implemented by the upstream code linked above. With the new implementation, truncation will occur at an earlier stage by Mihon, which will prevent that issue. Based on discussion with @BrutuZ and See #2206 (comment) for where this was requested. Note that I opted not to add hashes to the manga directories, as I don't think duplicate manga names under the same source are nearly as common as duplicate chapter names under the same manga, and using only one hash in the directory structure simplifies things. I also opted to leave the existing restrictions on length and character set alone (standard FAT metacharacters disallowed, max length 240 bytes) since I think those are reasonable and the issue was actually that the maximum length constraint wasn't being enforced correctly. I believe this to resolve all the issues linked in the comment above, except for #2132 since it seems basically unrelated, and since the feature request and its motivation remain rather unclear to me even after reading most of the discussion. 2. Use XML metadata in local sourceWhen the chapter list on a local-source manga entry is refreshed, each chapter is checked for an embedded There is a performance implication: refreshing the chapter list for a local-source manga will take longer as Mihon must check to see if XML files are included. There is an existing feature in Mihon where including a file in the manga directory called Some minor refactoring in See #2206 (comment) for where this was requested. |
|
fatotak@4febec5 and fatotak@e2f0337 implement changes suggested by @Syer10 on Discord to use a more memory-efficient string truncation algorithm and to eschew |
|
Per request of @AntsyLich in Discord, I've filed my own PR #2305 to incorporate all code changes in one place. Further updates will take place there (but can also be merged into the original PR by its author, if they so desire). |
|
Superceded by #2305 |
Fixed: new downloads work perfectly.
Working: reading works as far as I can tell.
Not done/TODO:
Test artefacts as below: