Problem
pydata.egress converts DataFrame columns to namedtuple field names during the to_list_of_dataclasses() path. If a column name isn't a valid Python identifier (e.g., foo-bar, class, or contains spaces/special chars), the namedtuple construction fails.
Reproduction
import polars as pl
from mountainash.pydata import PydataEgress
df = pl.DataFrame({"valid_col": [1], "foo-bar": [2], "class": [3]})
# Attempt to convert to dataclasses or named tuples → crash
Context
Discovered while designing mountainash-wearables' data querying layer. When conforming API responses with keep_only_mapped=False, unmapped provider columns (which can have arbitrary names like average_heartrate_bpm, foo-bar, or even reserved words) flow into egress and crash.
Workaround: Use keep_only_mapped=True in TypeSpecs to avoid the issue entirely. Raw data is preserved via a sidecar list, not through unmapped columns.
Suggestion
Egress should either:
- Sanitize column names before namedtuple construction (replace invalid chars with
_, prefix digits)
- Skip/warn on non-identifier columns rather than crashing
- Use a dict-based intermediate instead of namedtuple when column names aren't all valid identifiers
This is low priority since keep_only_mapped=True avoids it, but it's a surprising failure mode for users who don't expect column names to matter.
Problem
pydata.egressconverts DataFrame columns tonamedtuplefield names during theto_list_of_dataclasses()path. If a column name isn't a valid Python identifier (e.g.,foo-bar,class, or contains spaces/special chars), the namedtuple construction fails.Reproduction
Context
Discovered while designing mountainash-wearables' data querying layer. When conforming API responses with
keep_only_mapped=False, unmapped provider columns (which can have arbitrary names likeaverage_heartrate_bpm,foo-bar, or even reserved words) flow into egress and crash.Workaround: Use
keep_only_mapped=Truein TypeSpecs to avoid the issue entirely. Raw data is preserved via a sidecar list, not through unmapped columns.Suggestion
Egress should either:
_, prefix digits)This is low priority since
keep_only_mapped=Trueavoids it, but it's a surprising failure mode for users who don't expect column names to matter.