-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto guess encoding #21416
Auto guess encoding #21416
Conversation
@katainaka0503, thanks for your PR! By analyzing the history of the files in this pull request, we identified @bpasero and @egamma to be potential reviewers. |
@katainaka0503, |
3bc0245
to
6e673eb
Compare
@katainaka0503 |
@buzzzzer |
Thanks for this PR, current status is that we are reviewing the implications of JSChardets LGPL license for VS Code. |
0e327c7
to
d2993c8
Compare
Rebased. |
d2993c8
to
81fc6a4
Compare
src/vs/base/node/encoding.ts
Outdated
@@ -92,4 +93,24 @@ export function detectEncodingByBOMFromBuffer(buffer: NodeBuffer, bytesRead: num | |||
*/ | |||
export function detectEncodingByBOM(file: string): TPromise<string> { | |||
return stream.readExactlyByFile(file, 3).then(({buffer, bytesRead}) => detectEncodingByBOMFromBuffer(buffer, bytesRead)); | |||
} | |||
|
|||
const IGNORE_ENCODINGS = ['ascii', 'utf-8', 'utf-16', 'urf-32']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This meant to be "utf-32" instead of "urf-32", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Just a typo. I am so sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I expect this feature to be merged into VSCode. Thanks very much. |
No status update yet from what I wrote here. |
Any progress? Thanks. |
I think it is no use rushing. |
05a72cd
to
63866ff
Compare
Rebased |
@katainaka0503, |
@katainaka0503 have you thought about allowing a user to ad-hoc detect the encoding of an opened file from the encoding picker? From a file, click on the encoding and pick the option to Reopen with Encoding: In this list, the first entry (maybe with a separator) could be to automatically detect the encoding from the file: One thing to keep in mind is that currently we only allow the first 512 bytes of the file to be used when detecting the encoding. This might cause some encodings to not get detected properly. If we have an explicit action for detection, we could maybe increase the buffer size to give the detection a higher chance of finding an encoding. Otherwise this PR seems well programmed, good job. Still waiting on the license approval. |
@bpasero Thanks! I am very pleased. I think it's very good if it is able to select I think one of the problems is the word And also In the description of What's do you think? |
@katainaka0503 yes, this would be an optional thing to "guess" the encoding from the picker in case the user does not want to opt in to the setting for all files. Since we already have As for the setting, maybe it would be better to rename it to I would then also suggest to rename the flag from As you probably already figured out, we are using iconv-lite to convert to and from encodings. If Finally I am running into some issues when testing this and I think this exposes a problem with encoding auto detection in general. After making changes to a file and reopening it, the detected encoding is suddenly different and characters cannot be displayed anymore. Steps to reproduce:
Contents of file:
After reopening: |
@bpasero Ok, I understand. Then, the remaining tasks are
Thanks for precise test! |
@bpasero I found there are two ways to add entry in encoding picker. One is to add I think the latter is better. Because users can decide based on result of guessing encoding. |
Maybe we go with what we have now and wait for more user information on how this behaves in the specific cases. |
fa508f2
to
a68d703
Compare
I added a guessed encoding entry in encoding picker as a prototype. |
e160f24
to
d812384
Compare
Merged. did some slight changes to encoding picker to prevent showing encodings multiple times. |
@bpasero Thanks! |
@katainaka0503 just so that you know, there is a number of cases already where encoding guessing does not work: aadsm/jschardet#31, aadsm/jschardet#29, aadsm/jschardet#30 |
@bpasero Thanks for letting me know! I'll check them. |
This related to #5388.
This PR is modified one of #10013.
From #10013,
Added
'files.autoDetectEncoding'
insettings.json
to decide whether to automatically detect encoding.