Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image dumps fail when an invalid character is present #472

Open
TrashBandatcoot opened this issue Aug 1, 2023 · 3 comments
Open

Image dumps fail when an invalid character is present #472

TrashBandatcoot opened this issue Aug 1, 2023 · 3 comments

Comments

@TrashBandatcoot
Copy link

I've tried to download a pretty messy wiki in the hopes to transfer it to Miraheze, but during the dumping process, the very first image failed because characters are present that cannot be used to create a file or folder in Windows, such as question marks or quotation marks.

IOError: [Errno 22] invalid mode ('wb') or filename: u'C:\Users\XXXXX\Desktop\PTW-2/images/3LameStudio what did you do to my boys?.png'

Removing the file from the wiki altogether fixed it.

@nemobis
Copy link
Member

nemobis commented Aug 7, 2023 via email

@ohei2
Copy link

ohei2 commented Oct 23, 2024

I recently ran into the same problem, but you don't need to remove the file from the wiki.
Instead use the wiki's option to "move" the file to a new name and skip the offending characters, in your case above the question mark, which has been ever since a placeholder in the DOS/Windows world and thus cannot be used in file names.

20241023003232

@ohei2
Copy link

ohei2 commented Oct 25, 2024

You can use something like this to sanitize filenames for Windows:

def mwSanitizeFname4Windows(fname):
    """ Returns a sanitized filename for Windows """
    # Windows invalid chars in filenames
    # < (less than)  # > (greater than)  # : (colon)
    # " (double quote)  # / (forward slash)  # \ (backslash)
    # | (vertical bar or pipe)  # ? (question mark)  # * (asterisk)
    
    sanName = fname.translate({ord(i): None for i in '<>;"/\|?*'})
    return sanName

In generateImageDump I added the following:

# At the start
MyOS = platform.system()

[...]

       # In the for loop:
        if MyOS == 'Windows':
            if any (i in '<>:\"/\\|\?*' for i in filename2):
                fnametemp = filename2
                filename2 = mwSanitizeFname4Windows(fnametemp)
                print '(Windows) Illegal character(s) in file name. Sanitizing ',fnametemp
                logerror(
                    config=config,
                    text=u'(Windows) Illegal character(s) found. Changing %s to %s' % (fnametemp, filename2)
                )

The logging is done so you will be able to easily access which filenames have been changed.
This worked for me. I got a lot of errors before.
I don't use the XML functionality, but you could sanitize the filenames there too I suppose.

P.S.: I usually don't do Python, so please forgive if I did something unconventional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants