Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Txt file not recognized as txt file #10

Open
Petitgnoll6 opened this issue Feb 7, 2024 · 2 comments
Open

Txt file not recognized as txt file #10

Petitgnoll6 opened this issue Feb 7, 2024 · 2 comments

Comments

@Petitgnoll6
Copy link

Hello,

First of all, thanks for your work !

I need to batch convert 2000 txt file in folder structure with other files (like jpg, mp4, etc).
UTF-8 and UTF-8 without BOM adds perfectly but I can't add ANSI/Windows-1252 files it says they're not recognised as text file.
And i can't use the "no filter" since it would add all the mp4, jpeg, etc.

When I add 1 ANSI file (manually or without filter), I have to manually set Windows-1252 and then it recognise the file and it converts fine.

Test file: https://file.io/P3H5EkM4CY2B

@tomwillow
Copy link
Owner

Thanks for your feedback.

I downloaded and tried the test file, the behavious of program is as you said. The reason for this is that this program uses two encoding detection engines(uchardet and icu-dt). Each engine provides a confidence number during the detection process, and I combined their results to an encoding result. Encoding detection is not a task with 100% success rate. If I were to decrease the confidence threshold, it would lead to incorrect results and make users convert their files to wrong encoding.

I have made efforts to improve the precision of encoding detection, but no significant progress so far.

For your situation, maybe you could use file explorer to sort files by order of file extension, then select all the txt file, and drag them into SmartCharsetConverter at "no filter" mode.

Once the presicion of encoding detection is improved, I will notify you.

@Petitgnoll6
Copy link
Author

Yes, thanks for your answer and explanations.
I understand the choice. When you're outside of this world you wouldn't imagine that encoding detection is so hard !

For the workaround I used:

  • Import with filter on => you get all the utf8 files.
  • Search all your txt files (i used Everything for this)
  • Add them with "no filter", it will only add the non recognised files since the utf8 are already there
  • Shift + click and enjoy

Thanks again for your software

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants