Usage of `ignore_token` parameter to `word_segmentation` not documented enough, does not work #87

sbhaktha · 2021-01-19T20:15:07Z

I have phrases with named entities that I want the word_segmentation API to ignore. I tried replacing the named entities with SPECIAL_TOKEN_1, SPECIAL_TOKEN_2 etc in the phrase itself, then passing SPECIAL_TOKEN_1 and SPECIAL_TOKEN_2 as ignore_token to the call to word_segmentation. I cannot get this to work.

phrase = "Hello SPECIAL_TOKEN_1, I am happyto meet you tomorrowmorning. Thanks, SPECIAL_TOKEN_2"
phrase_suggestions = sym_spell.word_segmentation(test_phrase)

phrase_suggestions looks like this:

Composition(segmented_string='Hello **SPECIAL _TOKEN_ 1,** I am happy to meet you tomorrow morning. Thanks, **SPECIAL_ TOKEN_2**', corrected_string='Hello Special token of I am happy to meet you tomorrow morning Thanks Special Token', distance_sum=14, log_prob_sum=-55.6460931972679)

Notice how SPECIAL_TOKEN_1 and SPECIAL_TOKEN_2 get broken.

I tried using the ignore_token argument but cannot get it to work--

phrase = "Hello SPECIAL_TOKEN_1, I am happyto meet you tomorrowmorning. Thanks, SPECIAL_TOKEN_2"
phrase_suggestions = sym_spell.word_segmentation(test_phrase, ignore_token='SPECIAL_TOKEN_1')

I get back the same phrase_suggestions as before. Also not sure how to pass multiple tokens to ignore.

Also tried:

phrase_suggestions = sym_spell.word_segmentation(test_phrase, ignore_token=r"SPECIAL_TOKEN_\d")

and I get the following returned as phrase_suggestions:

Composition(segmented_string='Hello **SPECIAL _TOKEN_ 1**, I am happy to meet you tomorrow morning. Thanks, **SPECIAL_ TOKEN_2**', corrected_string='Hello Special token of I am happy to meet you tomorrow morning Thanks Special Token', distance_sum=14, log_prob_sum=-55.6460931972679)

Could you please help and also add more documentation on using this parameter?

What's the recommended way to deal with named entities?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of `ignore_token` parameter to `word_segmentation` not documented enough, does not work #87

Usage of `ignore_token` parameter to `word_segmentation` not documented enough, does not work #87

sbhaktha commented Jan 19, 2021 •

edited

Loading

Usage of ignore_token parameter to word_segmentation not documented enough, does not work #87

Usage of ignore_token parameter to word_segmentation not documented enough, does not work #87

Comments

sbhaktha commented Jan 19, 2021 • edited Loading

Usage of `ignore_token` parameter to `word_segmentation` not documented enough, does not work #87

Usage of `ignore_token` parameter to `word_segmentation` not documented enough, does not work #87

sbhaktha commented Jan 19, 2021 •

edited

Loading