You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the Wrapper, I got the following error:
[Server] Started socket server on port 12340
INFO:StanfordSocketWrap:Successful ping. The server has started.
INFO:StanfordSocketWrap:Subprocess is ready.
Adding Segmentation annotation ... INFO: TagAffixDetector: useChPos=false | useCTBChar2=true | usePKChar2=false
INFO: TagAffixDetector: building TagAffixDetector from edu/stanford/nlp/models/segmenter/chinese/dict/character_list and edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb
Loading character dictionary file from edu/stanford/nlp/models/segmenter/chinese/dict/character_list
Loading affix dictionary from edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb
你爱我吗?
--->
[你, 爱, 我, 吗, ?]
java.lang.RuntimeException: don't know how to handle annotator segment
at corenlp.JsonPipeline.addAnnoToSentenceObject(JsonPipeline.java:282)
at corenlp.JsonPipeline.processTextDocument(JsonPipeline.java:312)
at corenlp.SocketServer.runCommand(SocketServer.java:140)
at corenlp.SocketServer.socketServerLoop(SocketServer.java:194)
at corenlp.SocketServer.main(SocketServer.java:107)
Any idea why this is happening? Many thanks in advance!
The text was updated successfully, but these errors were encountered:
the wrapper doesnt support it -- you'd have to modify the java code where the error is happening, to add in the segmentation information to the json output.
Hi! I wonder if anyone has used the Wrapper to parse Chinese texts before?
I have the following code:
from stanford_corenlp_pywrapper import sockwrap
parser_path = "/Users/hbyan2/Downloads/stanford-corenlp-full-2015-04-20/*"
cn_model_path = "/Users/hbyan2/Downloads/stanford-corenlp-full-2015-04-20/stanford-chinese-corenlp-2015-04-20-models.jar"
p = sockwrap.SockWrap(
configdict={
'annotators':"segment, ssplit, pos, parse",
'customAnnotatorClass.segment': 'edu.stanford.nlp.pipeline.ChineseSegmenterAnnotator',
'segment.model': 'edu/stanford/nlp/models/segmenter/chinese/ctb.gz',
'segment.sighanCorporaDict': 'edu/stanford/nlp/models/segmenter/chinese',
'segment.serDictionary': 'edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz',
'segment.sighanPostProcessing': True,
'ssplit.boundaryTokenRegex': '[.]|[!?]+|[。]|[!?]+',
"parse.model": "edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz",
"pos.model": "edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger"
},
corenlp_jars=[parser_path, cn_model_path]
)
p.parse_doc(u"你爱我吗?")
The configs are taken from the default CoreNLP properties for parsing Chinese: https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline/StanfordCoreNLP-chinese.properties
When running the Wrapper, I got the following error:
[Server] Started socket server on port 12340
INFO:StanfordSocketWrap:Successful ping. The server has started.
INFO:StanfordSocketWrap:Subprocess is ready.
Adding Segmentation annotation ... INFO: TagAffixDetector: useChPos=false | useCTBChar2=true | usePKChar2=false
INFO: TagAffixDetector: building TagAffixDetector from edu/stanford/nlp/models/segmenter/chinese/dict/character_list and edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb
Loading character dictionary file from edu/stanford/nlp/models/segmenter/chinese/dict/character_list
Loading affix dictionary from edu/stanford/nlp/models/segmenter/chinese/dict/in.ctb
你爱我吗?
--->
[你, 爱, 我, 吗, ?]
java.lang.RuntimeException: don't know how to handle annotator segment
at corenlp.JsonPipeline.addAnnoToSentenceObject(JsonPipeline.java:282)
at corenlp.JsonPipeline.processTextDocument(JsonPipeline.java:312)
at corenlp.SocketServer.runCommand(SocketServer.java:140)
at corenlp.SocketServer.socketServerLoop(SocketServer.java:194)
at corenlp.SocketServer.main(SocketServer.java:107)
Any idea why this is happening? Many thanks in advance!
The text was updated successfully, but these errors were encountered: