-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Japanese medical (1 of 3): append "+" or "-" to the preceding Concept if followed by a space #138
Comments
In a separate e-mail, I've requested @Rei-hub to check his data to find out exactly which characters are used to as the minus sign. By default, the following characters will be considered to be the minus sign for this issue: single-width hyphen minus (-): U+002D There are other characters that "look like" the minus sign, and depending on the author's preference (or typo), it's more than possible that such characters are used instead, especially in informal notes. For example: minus (−): U+2212 If we are to consider only the single- and double-width hyphen minus characters as the minus sign, the requested change is quite simple and relatively innocuous to other types of text. We already suspect that the cho-on characters (half- and full-width) may be used sometimes unintentionally. Once I hear from Dr. Noguchi on the character usage within his data, I'll look into any possible ramification of including such characters in the change. |
Thank you for your valuable feedback, and sorry for the late response.
Considering how they are used in the actual sentences, there seems to be no problem in concatenation with the preceding Concept as a single entity for all cases. I would like to get your feedback about the possibility of an unintended effect on other expressions. |
The implementation will involve the following:
-公道での自動運転実施における技術・ノウハウ、ガイドラインを共有- In this case, the "-" at the end of the line should not be concatenated with 共有. To overcome this, we need a rule for cases where both the first character & last character of a sentence are hyphens. For such sentences, the hyphen will not be appended to the word before it. I am not aware if such solution still interferes with clinical text, but this is the best solution for now
海洋細菌で見つけた新しい光エネルギー利用機構-塩化物イオンを輸送するポンプの発見- Such cases can potentially be handled, but the hyphen within a sentence often serves another purpose, for example: HCCを疑いますが、経胆管的に肝門部-総肝管まで進展し、4cm長の腫瘤を形成し、両側肝内胆管は拡張しています。(FromTo) For this issue, this particular case, i.e., hyphen in the middle and end of a sentence - will not be dealt with. The hyphen at the end of the sentence will be appended to the last character for the time being, although it is not ideal as an output.
20代、スマホからの予約が急増- The hyphen at the end of the sentence will be appended to the last character for the time being, although it is not ideal as an output. |
@Rei-hub - this issue has been addressed as described above. There are some cases that cannot be addressed, but I believe the general output is as expected or better, even in non-medical/clinical text. Please have a look when get a chance. Thanks for your feedback as always. |
NOTE: This is the first of the three items reported by @Rei-hub in issue #137, to enable analysis of EMR's daily progress notes that are less formal than discharge summaries.
Extract a word followed by +/- without parentheses as a single entity
The previous improvement (#31) enabled Katakana or numbers enclosed in parentheses to be concatenated with the preceding Concept as a single entity. This works in many cases, especially in stylized documents, and is useful for identifying the relation of negation. (e.g. heart murmur(-) → no heart murmur)
However, in informal text such as daily progress notes, there is a problem. Some entities are followed by +/- without parentheses. Even in these cases, +/- symbol should be concatenated with the preceding Concept as a single entity because doctors describe the text with the same intention, and this enables us to clarify the relation of negation. Is this improvement technically possible?
Importantly, in many cases of these, there is often no space between an entity and +/-, whereas there is often half-width or full-width space after +/- to separate from the next entity.
The text was updated successfully, but these errors were encountered: