Skip to content

Collate bug: loss of tokens during collation #94

@matgille

Description

@matgille

Hi,

I'm facing a loss of tokens with the following json data (juxtaposition_2.json), after collation.
juxtaposition_6.json

Token rmtBBTT is lost during process, and I don't know exactly why. The number of witnesses seems to have some influence on the result.
juxtaposition_6_coll.json

The data is of poor quality (it's HTR output and I want to use collation for correction purposes) but no token should disappear during the collation process...

The test code used to collate this data is the following:

import collatex
import json

with open("juxtaposition_6.json") as input_json:
  collatex_dict = json.load(input_json)
collation_table = collatex.collate(collation=collatex_dict, output="json", segmentation=False, near_match=True)


with open("juxtaposition_6_coll.json", "w") as output_json:
  output_json.write(collation_table)

I'll try to investigate when I have some time. Any clue on where to look ?

Best,

Matthias

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions