*.prp files contain ^M artifacts which break model.setup()

Observed on LDC2014T12 data instances:
 - train_380
 - train_961
 - train_995
 - train_1442

After preprocessing, there is this *.prp file which contains annotations done by the Stanford CoreNLP tool. I have noticed that in all the cases above there is a ^M in the middle of CoreNLP output, like so:

```                                                                                                                      
[Text=currently CharacterOffsetBegin=0 ... ]^M                                                                                                                                                                                                    
[Text=america CharacterOffsetBegin=10 ... ]^M                                                                                                      
^M                                                                                                                                                                                                                 
[Text=is CharacterOffsetBegin=18 ... ]^M         
...
```
Not sure why this happens  -- maybe CoreNLP does not process multi-sentence instances correctly? In any case, reporting for those who might wonder what is going on.

I solved it by manually deleting the dangling ^M part from the *.prp file. 

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*.prp files contain ^M artifacts which break model.setup() #14

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

*.prp files contain ^M artifacts which break model.setup() #14

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions