Add sparse_waveform_mean and sd column to Units table

Address https://github.com/NeurodataWithoutBorders/pynwb/issues/1693

This issue tracks the revamp of https://github.com/NeurodataWithoutBorders/nwb-schema/pull/576 which was merged and then pulled (https://github.com/NeurodataWithoutBorders/nwb-schema/pull/595) after issues were raised when implementing this in pynwb.

The NWB TAB discussed this issue and proposed solutions on Tuesday. 

The original proposal in https://github.com/NeurodataWithoutBorders/pynwb/issues/1693 is to change "waveform_mean" and "waveform_sd" to be singly ragged arrays that allow a different number of electrodes per unit.
<img width="1202" alt="Image" src="https://github.com/user-attachments/assets/602ddc9c-48b0-4c39-9ac4-fa20d5dc33a6" />

**Option 1**: Support both the old full array form and the new ragged array form under the same column name "waveform mean". API could detect if "waveform_mean_index" column exists, and if so, parse the slices assuming the new format.
Pro: Both forms represent the same data, just structured differently, so it makes sense to use the same column name.
Con: Ryan thinks this is likely to be confusing to users and challenging to implement. Read/slicing pattern is different, documentation is different, and it will be hard for pynwb to detect during file creation which form should be used. 

**Option 2**: Support both forms but make the ragged array form doubly ragged – over both electrodes and samples. Then the read/slicing pattern is the same. Making the table doubly ragged might be less efficient though. And it is quite rare to have a different number of samples, though it has happened, e.g., where one session includes units acquired from two different acquisition systems with different sampling rates, and there are a different number of samples for spikes from different systems (to keep the length of the window in seconds the same). However, this use case may be better served by storing units under two different Units tables. 

**Option 3**: Introduce a new column named "sparse_waveform_mean" (or see other options below) that is singly ragged under a different name. There may still be a performance benefit to having the old full array form. If so, it may be better to keep both forms and maybe rename "waveform_mean" to "full_waveform_mean". Alternatively, we could discourage and deprecate "waveform_mean" in favor of "sparse_waveform_mean" which can handle the full case.
Pro: Clean implementation
Con: Having two columns that represent the same data and are accessed differently will probably be confusing to users. 

Possible new column names:
- sparse_waveform_mean (consistent with spikeinterface, preferred by Alessio)
- varying_waveform_mean
- selected_waveform_mean
- electrodes_waveform_means

We are leaning toward Option 3 and gradual deprecation of the existing "waveform_mean" column but I would like to run some quick performance tests first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sparse_waveform_mean and sd column to Units table #615

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add sparse_waveform_mean and sd column to Units table #615

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions