Skip to content

Latest commit

 

History

History

t5_base_japanese_ner

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

T5 base Japanese ner

Named entity recogtinion model made by fine-tuning sonoisa/t5-base-japanese

Input

TEXT file. The default text is "伊藤左千夫は1893年から知人から学んだ短歌を詠むようになったが、当初は古今和歌集の流れをくむ月並調の伝統的な短歌を詠んでいた。"

Output

Dictionaries of the recognized named entities. Span indicates the start and end of the named entity in the original sentence, and type indicates the category of the named entity. This model was trained to classify entities into one of the following categories: {人名, 法人名, 政治的組織名, その他の組織名, 地名, 施設名, 製品名, イベント名}. Finally, the 'text' contains the text of the named entity.

[{'span': [0, 5], 'type': '人名', 'text': '伊藤左千夫'}, {'span': [36, 41], 'type': '製品名', 'text': '古今和歌集'}]

Usage

An Internet connection is required when running the script for the first time, as the model files will be downloaded automatically.

Predicted named entities in the input text file will be automatically generated by running the script below.

Running this script in FP16 environments will result in an error due to the range of the floating point expression. Switch to using CPU if necessary. (This is done by setting the argument -e to 0 in the example below)

$ python3 t5_base_japanese_ner.py -f input.txt

Here is how to use the -i (or --input) argument instead.

$ python3 t5_base_japanese_ner.py -i 2008年10月5日、アウェーでのレクレアティーボ・ウェルバ戦でプリメーラ・ディビシオンでの初得点を決めた。

By using the --savepath option, the pickle of the list will be saved to the specified path.

$ python3 t5_base_japanese_ner.py -f input.txt -s result.pickle

Reference

Framework

PyTorch

Model Format

ONNX opset=12

Netron

encoder

t5_base_japanese_ner_enc.onnx.prototxt

decoder

t5_base_japanese_ner_dec.onnx.prototxt