-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Labels
Description
Hi,
I'm facing a loss of tokens with the following json data (juxtaposition_2.json), after collation.
juxtaposition_6.json
Token rmtBBTT is lost during process, and I don't know exactly why. The number of witnesses seems to have some influence on the result.
juxtaposition_6_coll.json
The data is of poor quality (it's HTR output and I want to use collation for correction purposes) but no token should disappear during the collation process...
The test code used to collate this data is the following:
import collatex
import json
with open("juxtaposition_6.json") as input_json:
collatex_dict = json.load(input_json)
collation_table = collatex.collate(collation=collatex_dict, output="json", segmentation=False, near_match=True)
with open("juxtaposition_6_coll.json", "w") as output_json:
output_json.write(collation_table)I'll try to investigate when I have some time. Any clue on where to look ?
Best,
Matthias