Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and handle charsets #18

Open
fxnn opened this issue Jan 2, 2016 · 0 comments
Open

Detect and handle charsets #18

fxnn opened this issue Jan 2, 2016 · 0 comments
Milestone

Comments

@fxnn
Copy link
Owner

fxnn commented Jan 2, 2016

Currently, not all human-readable (i.e. non-binary) files can be edited. This is because

  • our heuristics for detecting editable mime types in the http/editor package is quite bad, and
  • we have no mechanism for converting from/to utf-8, which could introduce some encoding problems with browsers.

As propsed in #17, we should use github.com/saintfish/chardet to detect the charsets and to also detect whether a file is editable or not.

Furthermore, we can take the detected charset's IANA identifier and use the golang.org/x/text/encoding packages (together with its ianaindex) to decode the file to UTF-8 before displaying it. When saving it, we could

  • just save it as UTF-8, or
  • detect the files content type again and encode back to that type.

The first method might have the drawback of errors during misdetection of content types -- here, we should make use of the fact that chardet provides a propability for the content type detection. If the certainty is not too high, we should store the file as UTF-8. Also, all this should be disableable from the configuration.

@fxnn fxnn modified the milestone: 1.1.0 Jan 16, 2016
@fxnn fxnn modified the milestones: 1.0.0, 1.1.0 Apr 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant