Replies: 2 comments
-
|
This is a thorough analysis of the issue. WinkNLP relies on a pre-trained model plus a small set of rules. We had to introduce rules because there were limited open-source training data available under a permissive license. One change we suggest is updating the rule that forces capitalized NOUN and ADJ tokens to be tagged as PROPN. Specifically, if you modify line feature.js:191to: you can obtain the following POS tags and lemmas: // POS Tags
[
'NOUN', 'ADP', 'NOUN',
'AUX', 'VERB', 'PUNCT',
'NOUN', 'ADP', 'PROPN',
'PUNCT', 'VERB', 'ADP',
'NOUN', 'ADJ', 'PUNCT',
'NUM'
]
// Lemmas
[
'line', 'for', 'march',
'be', 'form', '!',
'sight', 'of', 'london',
',', 'begin', 'on',
'march', '15th', ',',
'2025'
]You may want to fork the model, apply this change, and see whether it works for your use case. Note that it may slightly reduce overall POS-tagging accuracy. |
Beta Was this translation helpful? Give feedback.
-
|
@rachnachakraborty Thank you for the detailed response. It is very much appreciated. I'll give it a try. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm seeing first words of sentences being evaluated as
PROPN, so they are not lemmatized as expected. Here's an example.This produces these parts of speech. Notice that
LinesandSightsare evaluated asPROPN.Because of this, lemmas for
LinesandSightsare not derived from them being just nouns:If I lowercase the whole sentence, I lose
Londonbeing evaluated asPROPN, but lemmas will be correct. These are the lemmas and parts of speech for the same sentence lowercased via.toLowerCase(). Notice that lemmas are now as expected, butLondonis no longer evaluated asPROPN.Is there a way to parse a document in such a way that the first word of a sentence is not automatically considered a
PROPN?So far, the only solution I found is to locate the first word of each sentence and lowercase it, but this will mess up cases where the first word is a proper noun, like
London is beautiful.Any help would be appreciated. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions