entities must span whole tokens. Wrong entity end. #122

xyiiinexg3 · 2022-08-18T10:37:49Z

现象
在输入命令行后：rasa train -c config/config.yml --data data/training_dataset_1660793545.json data/stories.md --out models/movie --domain config/domain.yml --num-threads 5 --augmentation 100 -vv。
会出现类似以下的warning提示：
C:\Users\26282\miniconda3\envs\rasa2formovieQA\lib\site-packages\rasa\shared\utils\io.py:93: UserWarning: Failed to use example '郭富城表演过哪些喜剧电影' to train MITIE entity extractor. Example will be skipped.Error: Invalid entity {'end': 10, 'entity': 'genre', 'start': 8, 'value': '喜剧'} in example '郭富城表演过哪些喜剧电影': entities must span whole tokens. Wrong entity end.
这导致在后面模型跑起来的时候，识别不出genre这种实体（喜剧、动画等等）。

训练模型的数据
{"text":"方中信表演动画电影有哪些","intent":"search_person_genre_movie","entities":[{"end":3,"entity":"person","start":0,"value":"方中信"},{"end":7,"entity":"genre","start":5,"value":"动画"}]}

config.yml
有设置jieba分词的用户词典
pipeline:

name: "MitieNLP"
model: "data/total_word_feature_extractor_zh.dat"
name: "JiebaTokenizer"
dictionary_path: "jieba_userdict"
name: "MitieEntityExtractor"
name: "EntitySynonymMapper"
name: "RegexFeaturizer"
name: "MitieFeaturizer"
name: "SklearnIntentClassifier"

The text was updated successfully, but these errors were encountered:

xyiiinexg3 · 2022-08-18T11:20:00Z

我统计了下，在genre词典中，只有动画、恐怖、喜剧、科幻这四种，不能识别出来。请问这是为什么呀？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

entities must span whole tokens. Wrong entity end. #122

entities must span whole tokens. Wrong entity end. #122

xyiiinexg3 commented Aug 18, 2022

xyiiinexg3 commented Aug 18, 2022

entities must span whole tokens. Wrong entity end. #122

entities must span whole tokens. Wrong entity end. #122

Comments

xyiiinexg3 commented Aug 18, 2022

xyiiinexg3 commented Aug 18, 2022