Skip to content

File name sanitation #97

@geoo89

Description

@geoo89

Gitsync generates filenames based on question/category titles.
I believe this is done by using a character blacklist to be replaced, but from my experience this still sometimes gives troubles.

For example, degree symbols appear to be problematic (e.g. Finding the Trigonometry Ratios in a 30°, 60° and 90° Triangle): We had a category with degree symbols, and while create_repo did create the appropriate files, they do not appear in the manifest. I've also had issues with degree symbols (and sometimes other symbols) when trying to open files e.g. in python scripts or referencing them in an XML attribute. (In this case e.g. UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc2' in position 122: surrogates not allowed)

What about using a whitelist instead of a blacklist for characters (though that may strip letters with accents altogether rather than removing the accent), or some existing slugify library or function (which will nicely handle international characters)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions