Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese: Request for some improvements of entity extraction algorithm in terms of more accurate analysis of medical colloquial text #137

Closed
Rei-hub opened this issue May 26, 2021 · 3 comments
Assignees

Comments

@Rei-hub
Copy link

Rei-hub commented May 26, 2021

I’m Rei Noguchi from Gunma University Hospital, and I really appreciate the prompt implementation of “negation expansion “ in Japanese (#33). I’m now trying to analyze daily progress notes in electronic medical records, and unlike discharge summaries described as stylized documents, the progress notes are often written in a colloquial or narrative style and includes incomplete sentences, resulting in some problems.
To analyze these casual text in the medical field more accurately, I would like to propose the following three improvements.

1. Extract a word followed by +/- without parentheses as a single entity
2. Resolve the different entity extraction results depending on punctuation mark (Japanese period ”。” or just a space)
3. Detect time expression

The details are as follows.


1. Extract a word followed by +/- without parentheses as a single entity

The previous improvement (#31) enabled Katakana or numbers enclosed in parentheses to be concatenated with the preceding Concept as a single entity. This works in many cases, especially in stylized documents, and is useful for identifying the relation of negation. (e.g. heart murmur(-) → no heart murmur)
However, in informal text such as daily progress notes, there is a problem. Some entities are followed by +/- without parentheses. Even in these cases, +/- symbol should be concatenated with the preceding Concept as a single entity because doctors describe the text with the same intention, and this enables us to clarify the relation of negation. Is this improvement technically possible?
Importantly, in many cases of these, there is often no space between an entity and +/-, whereas there is often half-width or full-width space after +/- to separate from the next entity.
image

2. Resolve the different entity extraction results depending on punctuation mark (Japanese period ”。” or just a space)

“熱はなし”(no fever)is extracted as a single entity at this time, probably because this phrase includes all hiragana homonym “はなし”. In contrast, if there is a punctuation mark (i.e. Japanese period “。”) in the end of the phrase, like “熱はなし。”, the phrase is divided into multiple entities. The latter case seems like a good option in terms of identifying negation relation.
However, because doctors often end a sentence with just a “space” in place of Japanese period “。”, I think that a phrase ending with a space should be divided into multiple entities in the same manner as “。”.
image

3. Detect time expression

In medical progress notes, there are many time expressions, so that it’s very useful that they could be identified by something like markers.
Some examples:

  • 2015-06-16 12:47:42 -> 2015-06-16 (Date) + 12:47:42 (Time) or 2015-06-16 12:47:42 (Datetime)
  • 「12月ごろ花粉症の内服処方」 (extracted as a single entity) -> 12月ごろ花粉症の内服処方 (Month)
    (in English: Around December prescription of medication for hay fever)

iKnow is an indispensable tool especially in a medical field, where there are many unknown words.
I realize the great value of iKnow and expect further improvement.
Thank you for your help.

@makorin0315 makorin0315 self-assigned this May 26, 2021
@makorin0315
Copy link
Collaborator

@Rei-hub - thank you very much for your requests. As discussed in our conversation yesterday, I believe most of these can be accommodated to your liking. I think it would be best to create an issue for each request, so that we can have focused discussions. With your permission, I would like to close this issue and open 3 new ones. Please let me know.

@Rei-hub
Copy link
Author

Rei-hub commented May 27, 2021

@makorin0315 Thank you for your quick response. I really appreciate your positive consideration of my requests. In regards to your suggestion about creating an issue for each request, I completely agree with you and It's actually better that way. Thank you for your help.

@makorin0315
Copy link
Collaborator

This issue has been split into 3 issues (#138, #140, #139). Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants