feat: Handle extra mdoc fields and support fieldname aliases by sumslogs · Pull Request #33 · teamtomo/mdocfile

sumslogs · 2026-01-27T19:14:49Z

Thermo's software (Tomo 5) is placing some fields into its Mdocs that were being dropped by mdocfile.

In particular, I observed it happening for "CountsPerElectron", and "FrameDosesAndNumber".
The SerialEM/IMOD documentation doesn't mention CountsPerElectron, but it's used elsewhere.

"FrameDosesAndNumber" is a bit more strange in that it's present in the SerialEM doc (and this package) as "FrameDosesAndNumbers" (plural).
However see:

SerialEM Source File: "FrameDosesAndNumber"
Pyserialem: "FrameDosesAndNumbers"
Eman2: "FrameDosesAndNumber"
(...)

To address those issues, this PR does a couple things:

Adds a mapping dictionary of "fieldname_aliases" in data_models.py to treat "FrameDosesAndNumber" as "FrameDosesAndNumbers". (And possible others in the future.)
Doesn't drop unknown fields, but instead pass them through and issue a warning at parsing time that they were observed.

I also noticed that the FrameDosesAndNumbers parser wasn't actually handling the sequence of tuple data structure correctly (and wasn't emitting the string serialization of it correctly) so I added that.

alisterburt

@sumslogs thank you for taking the time to put together a thoughtfully constructed PR with links to relevant documentation! Lots of great improvements here

I agree the FrameDosesAndNumbers thing is weird - the SerialEM file formats documentation differs from the SerialEM source you linked...

I had one minor question about a change you made in the .to_string() method but otherwise this looks good to go!

src/mdocfile/data_models.py

alisterburt · 2026-01-27T19:46:38Z

as an aside, would you be interested in joining our regular developer meeting? Happens once a month, next one is tomorrow at 8am PST. If interested could you introduce yourself and your interests in our zulip channel and DM me your email so I can add you to the calendar invite

alisterburt · 2026-01-27T20:28:12Z

src/mdocfile/data_models.py


 from mdocfile.utils import find_section_entries, find_title_entries

+log = logging.getLogger('mdocfile.parser')


could you switch this to just mdocfile if iterating? Thanks!

sumslogs · 2026-01-27T20:30:06Z

I noticed that pydantic actually has built-in alias support, so I refactored it to use that rather than making a custom dictionary mapping.

alisterburt · 2026-01-27T20:32:36Z

@sumslogs unsurprising, pydantic seems to try to cater to all 😂

Let me know when you're done iterating and we can merge

I'm going to give you maintainer rights on this repo as you seem responsible and thoughtful - please still go through the PR flow so people have an opportunity to review contributions

Great work here

sumslogs · 2026-01-27T21:41:31Z

src/mdocfile/data_models.py

    """
+    model_config = ConfigDict(extra='allow', # keep extra field data
+                              validate_by_name=True) # use our validations for aliased fields
+                              # serialize_by_alias=True) # use the version of the fieldname the file arrived as


@alisterburt What to do here isn't entirely clear; the options are to either force the field to take one value (i.e. force output to become plural since it's plural in the package's def), or to use what the file came as upon serialization.

Leaving field names as they were found (serialize_by_alias=True) makes sense in a context of someone using the package to validate/modify/output such that whatever originally created it can load it again. But could just as easily make an argument for standardizing it. It might make sense for the constructor methods to pass along pydantic config settings.

@sumslogs yeah no obvious right answer - my intuition is that we don't want things to change magically too much so probably using what the file came in as if that's relatively okay to implement

…ndNumbers

sumslogs · 2026-01-28T01:32:57Z

@alisterburt Alright after some iterating, I think this is ready to review again. It preserves the original field names on string and dataframe export, and I tried to keep it straight forward to be able to add new aliases if they arise.

alisterburt

wonderful - thanks for all of the effort here, tests make intended behavior nice and clear. Reactivated CI as it got auto deactivated so made a few no_op changes in tests to trigger that running

tests/test_data_models.py

tests/test_functions.py

alisterburt · 2026-01-28T06:41:35Z

deploying v0.2.3
https://github.com/teamtomo/mdocfile/actions/runs/21428002927

alisterburt reviewed Jan 27, 2026

View reviewed changes

src/mdocfile/data_models.py Outdated Show resolved Hide resolved

sumslogs force-pushed the unknown_or_aliased_fields branch from 09046f3 to 56c011f Compare January 27, 2026 20:26

alisterburt reviewed Jan 27, 2026

View reviewed changes

sumslogs force-pushed the unknown_or_aliased_fields branch from 56c011f to 0db8ec6 Compare January 27, 2026 20:30

sumslogs force-pushed the unknown_or_aliased_fields branch 2 times, most recently from 927a243 to 611d373 Compare January 27, 2026 21:38

sumslogs commented Jan 27, 2026

View reviewed changes

feat: Handle extra mdoc fields and use pydantic alias for FrameDosesA…

774cf29

…ndNumbers

sumslogs force-pushed the unknown_or_aliased_fields branch from 611d373 to 774cf29 Compare January 27, 2026 21:44

fix: original fieldnames are preserved on output when aliased

0eb3486

sumslogs force-pushed the unknown_or_aliased_fields branch from 2c4a7d6 to 0eb3486 Compare January 28, 2026 01:12

alisterburt approved these changes Jan 28, 2026

View reviewed changes

alisterburt added 5 commits January 27, 2026 22:34

Update tests/test_data_models.py

e091430

Update tests/test_functions.py

7515623

Update tests/test_functions.py

ece5511

Update tests/test_functions.py

7453490

Update tests/test_functions.py

c74c67b

alisterburt reviewed Jan 28, 2026

View reviewed changes

tests/test_functions.py Outdated Show resolved Hide resolved

Update tests/test_functions.py

2c8a5d1

alisterburt merged commit ee4d304 into teamtomo:main Jan 28, 2026
5 checks passed


		from mdocfile.utils import find_section_entries, find_title_entries

		log = logging.getLogger('mdocfile.parser')

Conversation

sumslogs commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alisterburt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alisterburt commented Jan 27, 2026

Uh oh!

alisterburt Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

sumslogs Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

sumslogs commented Jan 27, 2026

Uh oh!

alisterburt commented Jan 27, 2026

Uh oh!

sumslogs Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

sumslogs Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

alisterburt Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

sumslogs commented Jan 28, 2026

Uh oh!

alisterburt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alisterburt commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sumslogs commented Jan 27, 2026 •

edited

Loading