Gitsync generates filenames based on question/category titles.
I believe this is done by using a character blacklist to be replaced, but from my experience this still sometimes gives troubles.
For example, degree symbols appear to be problematic (e.g. Finding the Trigonometry Ratios in a 30°, 60° and 90° Triangle): We had a category with degree symbols, and while create_repo did create the appropriate files, they do not appear in the manifest. I've also had issues with degree symbols (and sometimes other symbols) when trying to open files e.g. in python scripts or referencing them in an XML attribute. (In this case e.g. UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc2' in position 122: surrogates not allowed)
What about using a whitelist instead of a blacklist for characters (though that may strip letters with accents altogether rather than removing the accent), or some existing slugify library or function (which will nicely handle international characters)?
Gitsync generates filenames based on question/category titles.
I believe this is done by using a character blacklist to be replaced, but from my experience this still sometimes gives troubles.
For example, degree symbols appear to be problematic (e.g.
Finding the Trigonometry Ratios in a 30°, 60° and 90° Triangle): We had a category with degree symbols, and whilecreate_repodid create the appropriate files, they do not appear in the manifest. I've also had issues with degree symbols (and sometimes other symbols) when trying to open files e.g. in python scripts or referencing them in an XML attribute. (In this case e.g.UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc2' in position 122: surrogates not allowed)What about using a whitelist instead of a blacklist for characters (though that may strip letters with accents altogether rather than removing the accent), or some existing slugify library or function (which will nicely handle international characters)?