-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode character cannot be truncated correctly #9
Comments
I would like to fix this issue. |
Go ahead for it.
…On Tue, 15 Aug 2017 at 18:37, Hrishi Hiraskar ***@***.***> wrote:
I would like to fix this issue.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAw9IYLl6MSQkgKSm8bv8ikNJxBXiWI7ks5sYckxgaJpZM4Frf4M>
.
|
Okay :) |
Hie The library works perfectly fine for Unicode characters But, in case of Chinese, there is no defined word boundary as we have in other languages (the
I don't know the language, so I don't know what above two means. So the inference is that, as the number of words can't be count due to non availability of word boundary, the library fails at Chinese characters. The work around I could propose is to split on basis of number of characters in case of Chinese. |
Unicode character such as Chinese character can't work.
Please fix it.
The text was updated successfully, but these errors were encountered: