Can't get the Chinese models to work

Hi! I wonder if anyone has used the Wrapper to parse Chinese texts before?
I have the following code:

from stanford_corenlp_pywrapper import sockwrap

parser_path = "/Users/hbyan2/Downloads/stanford-corenlp-full-2015-04-20/*"
cn_model_path = "/Users/hbyan2/Downloads/stanford-corenlp-full-2015-04-20/stanford-chinese-corenlp-2015-04-20-models.jar"

p = sockwrap.SockWrap(
    configdict={
        'annotators':"segment, ssplit, pos, parse",
        'customAnnotatorClass.segment': 'edu.stanford.nlp.pipeline.ChineseSegmenterAnnotator',
        'segment.model': 'edu/stanford/nlp/models/segmenter/chinese/ctb.gz',
        'segment.sighanCorporaDict': 'edu/stanford/nlp/models/segmenter/chinese',
        'segment.serDictionary': 'edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz',
        'segment.sighanPostProcessing': True,
        'ssplit.boundaryTokenRegex': '[.]|[!?]+|[。]|[！？]+',
        "parse.model": "edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz",
        "pos.model": "edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger"
    },  
    corenlp_jars=[parser_path, cn_model_path]
    )

p.parse_doc(u"你爱我吗？")

The configs are taken from the default CoreNLP properties for parsing Chinese: https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties

When running the Wrapper, I got the following error:

[Server] Started socket server on port 12340
INFO:StanfordSocketWrap:Successful ping. The server has started.
INFO:StanfordSocketWrap:Subprocess is ready.
Adding Segmentation annotation ... INFO: TagAffixDetector: useChPos=false | useCTBChar2=true | usePKChar2=false
INFO: TagAffixDetector: building TagAffixDetector from edu/stanford/nlp/models/segmenter/chinese/dict/character_list and edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb
Loading character dictionary file from edu/stanford/nlp/models/segmenter/chinese/dict/character_list
Loading affix dictionary from edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb
你爱我吗？
--->
[你, 爱, 我, 吗, ？]
java.lang.RuntimeException: don't know how to handle annotator segment
    at corenlp.JsonPipeline.addAnnoToSentenceObject(JsonPipeline.java:282)
    at corenlp.JsonPipeline.processTextDocument(JsonPipeline.java:312)
    at corenlp.SocketServer.runCommand(SocketServer.java:140)
    at corenlp.SocketServer.socketServerLoop(SocketServer.java:194)
    at corenlp.SocketServer.main(SocketServer.java:107)

Any idea why this is happening? Many thanks in advance!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't get the Chinese models to work #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Can't get the Chinese models to work #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions