Skip to content

how to deal with malformed data #381

Open
Open
@ppanero

Description

@ppanero

From migration runs we have found the following records whose content does not pass the transformation step due to malformed data (e.g. incorrect identifiers). We need to decide how to proceed:

  • Fix?
  • If draft, continue with wrong data. Would it fail on re-indexing?

Note that this issue is blocked until resources are clarified. It is an option for these records to be fixed post-migration.


Malformed identifiers in related_identifiers or alternate_identifiers:

UUID: ae12c346-7780-48df-b84f-558f08c315a8

zenodo_rdm.migrator.errors.InvalidIdentifier: {'relation': 'isSupplementTo', 'identifier': 'ISSN'}

UUID: 2ecdb513-41cc-4b96-9281-bb4fd4c04dd3

zenodo_rdm.migrator.errors.InvalidIdentifier: {'identifier': '10.21105.joss.00018'}

UUID: 6af87bcd-7b65-495c-99f7-cf8cdd063964

zenodo_rdm.migrator.errors.InvalidIdentifier: {'relation': 'isSupplementTo', 'identifier': ' arXiv:1601.08082'}

UUID: 96a504c2-aa6a-4590-a33d-8997c6e5c867

zenodo_rdm.migrator.errors.InvalidIdentifier: {'identifier': 'local:'}

Missing relation for related identifiers
UUID: aab56dde-ce52-4443-bc69-6140a096c416

Traceback (most recent call last):
  File "/root/zenodo-rdm-src-deps/invenio-rdm-migrator/invenio_rdm_migrator/streams/records/transform.py", line 62, in run
    yield self._transform(entry)
  File "/root/zenodo-rdm-instance/site/zenodo_rdm/migrator/transform/records.py", line 82, in _transform
    "draft": self._draft(entry),
  File "/root/zenodo-rdm-instance/site/zenodo_rdm/migrator/transform/records.py", line 55, in _draft
    return ZenodoDraftEntry().transform(entry)
  File "/root/zenodo-rdm-instance/site/zenodo_rdm/migrator/transform/entries/records/records.py", line 172, in transform
    transformed = super().transform(entry)
  File "/root/zenodo-rdm-src-deps/invenio-rdm-migrator/invenio_rdm_migrator/streams/records/transform.py", line 153, in transform
    "metadata": self._metadata(entry),
  File "/root/zenodo-rdm-instance/site/zenodo_rdm/migrator/transform/entries/records/records.py", line 168, in _metadata
    return ZenodoDraftMetadataEntry.transform(entry["json"])
  File "/root/zenodo-rdm-instance/site/zenodo_rdm/migrator/transform/entries/records/metadata.py", line 383, in transform
    "related_identifiers": cls._related_identifiers(
  File "/root/zenodo-rdm-instance/site/zenodo_rdm/migrator/transform/entries/records/metadata.py", line 190, in _related_identifiers
    "id": legacy_identifier["relation"].lower(),
KeyError: 'relation'

Metadata

Metadata

Assignees

No one assigned

    Labels

    blockedThis issue is blocked due to external dependenciesmigrationIssue related to migration e.g zenodo, cds etc.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions