Skip to content

fix setting pause symbol for non-kana symbol#8

Open
sophiefy wants to merge 1 commit into
r9y9:1.11from
sophiefy:1.11
Open

fix setting pause symbol for non-kana symbol#8
sophiefy wants to merge 1 commit into
r9y9:1.11from
sophiefy:1.11

Conversation

@sophiefy

@sophiefy sophiefy commented Sep 18, 2023

Copy link
Copy Markdown

Maybe this is more of a problem with the dictionary...

njd_set_pronunciation sets read, pron and other features for symbols with 0 mora size. Specifically, non-kana symbols will be set as 読点.

In the following example, is incorrectly parsed as 名詞 using MeCab and naist-jdic (whereas it should be 助詞).

1933年~1937年
1933	名詞,数,*,*,*,*,*
年	名詞,接尾,助数詞,*,*,*,年,ネン,ネン,1/2,C3
~	名詞,サ変接続,*,*,*,*,*
1937	名詞,数,*,*,*,*,*
年	名詞,接尾,助数詞,*,*,*,年,ネン,ネン,1/2,C3

Since its mora size is 0, its read, pron are set to and pos is set to 記号. Consequently, its features would be the following, which is weird.

~,記号,サ変接続,*,*,*,*,~,、,、,0,0,*,0

So I think pos_group, ctype and cform should also be modified and its features become:

~,記号,読点,*,*,*,*,~,、,、,0,0,*,0

tsukumijima pushed a commit to tsukumijima/open_jtalk that referenced this pull request Jul 29, 2024
* Add: 辞書コンパイルに-qを追加

* Delete: 関数のquietオプションを削除

* Fix: quietオプションが残っているのを修正

* Fix: quiet引数を使っていたのを修正

* Fix: thread_localのフォールバックを追加

* Fix: msvcでの動作を修正

* Update src/mecab/src/dictionary_generator.cpp

Co-authored-by: Hiroshiba <hihokaruta@gmail.com>

* Delete: thread_localを削除

* Change: CHECK_DIEはquietでも出るように

* Fix: まだ出力が出ていたのを修正

* cerr2箇所を戻す

---------

Co-authored-by: Hiroshiba <hihokaruta@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant