Wrong average embedding during inference due to a small bug in neuracoref.pyx

The average embeddings can be wrongly calculated during inference due to a small bug in neuralcoref.pyx:

https://github.com/huggingface/neuralcoref/blob/60338df6f9b0a44a6728b442193b7c66653b0731/neuralcoref/neuralcoref.pyx#L896

`PUNCTS` is a list of strings, while `token.lower` is an integer hash.
This means that punctuation embeddings will be added to the average embeddings of spans, causing a potential mismatch between training and inference features. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong average embedding during inference due to a small bug in neuracoref.pyx #336

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrong average embedding during inference due to a small bug in neuracoref.pyx #336

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions