Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better API for specifying return data of signals #376

Open
cx1111 opened this issue May 23, 2022 · 12 comments
Open

Better API for specifying return data of signals #376

cx1111 opened this issue May 23, 2022 · 12 comments

Comments

@cx1111
Copy link
Member

cx1111 commented May 23, 2022

The current rdrecord API is rather confusing when dealing with more complex records/signals. Namely: multi-frequency signals.

We should create a better API to allow users to:

  • Specify/know the exact return types/dimensions of the signals being read
  • Deal with single and multi-frequency records consistently.

In addition, the current API is also limiting when reading single-frequency records, as there is no way for users to explicitly specify whether they want 1d or 2d numpy arrays.

We current have:

def rdrecord(
where the only parameter that influences the return type is: smooth_frames

We've chatted about the issue here, where I suggested a new set of params:
#313 (comment)

What do people think of the suggestion of having two params: return_dims and smooth_frames? return_dims will apply for both single and multi-frequency signals, and smooth_frames will apply only to multi-frequency signals. Given the combination of the params, the return value of the signals will be:

(1, True): List of 1d arrays. Guaranteed to be same length.
(1, False): List of 1d arrays. Not guaranteed to be same length for multi-frequency signals.
(2, True): Single 2d array.
(2, False): Illegal combination.

@cx1111
Copy link
Member Author

cx1111 commented May 23, 2022

@tompollard @bemoody @alistairewj thoughts?

@bemoody
Copy link
Collaborator

bemoody commented May 24, 2022 via email

@cx1111
Copy link
Member Author

cx1111 commented May 24, 2022

Thanks. Some more thoughts:

  • Many users will want the 2d array to pass into convolutional neural networks or general multi-channel filter functions.
  • For multi-frequency records, the only way to avoid losing information is to return an array of 1d arrays.

I think it's worth it for our library to support both dimensions, despite the additional complexity. Regarding efficiency/performance, would either form perform a lot better?

I imagine for the ease of writing the functions, but at a large performance cost, we do have the option of always processing the signals as a list of 1d arrays, and then only reshaping into 2d before returning the value.

The function argument values can be required, and have defaults, so that we don't have to deal with the unspecified cases.

@bemoody
Copy link
Collaborator

bemoody commented May 24, 2022 via email

@cx1111
Copy link
Member Author

cx1111 commented May 25, 2022

Let's not worry about backwards compatibility here. The current API behavior is not ideal, and does not allow the user to get a predictable output.

I didn't think about the high-res option. I'm not sure how useful it would be, but we could provide it. We would then have combinations of return_dims and smooth_frames (or better name):

  • (1, None): List of 1d arrays. Not guaranteed to be same length for multi-frequency signals.
  • (1, 'average'): List of 1d arrays. Guaranteed to be same length.
  • (1, 'upsample'): List of 1d arrays. Guaranteed to be the same length. (For Record objects, what happens to fs?)
  • (2, None): Illegal combination.
  • (2, 'average'): Single 2d array.
  • (2, 'upsample'): Single 2d array.

We could also add an INFO logging statement that occurs when reading a multi-frequency record, so that it's more immediately obvious.

I realize that all of this only considers the final output array(s) (such as when calling the current rdsamp), not the intermediate steps and the object attributes used to represent the signals. Let me think about that.

@bemoody
Copy link
Collaborator

bemoody commented May 25, 2022 via email

@bemoody
Copy link
Collaborator

bemoody commented May 27, 2022

An idea I mentioned in the meeting today was to allow the p_signal (and e_p_signal, etc.) attributes to be lazily loaded, perhaps if you call rdrecord with stream=True or something like that.

A very rough sketch might look like this:

def rdrecord(record_name, sampfrom=0, sampto=None, stream=False):
    record = rdheader(record_name)
    if sampto is None:
        sampto = record.sig_len
    if stream:
        record.p_signal = Lazy2DSignalArray(record, sampfrom, sampto)
    else:
        record.p_signal = _read_2d_signals(record, sampfrom, sampto)
    return record

class Lazy2DSignalArray:
    def __init__(self, record, sampfrom, sampto):
        self._record = record
        self._base = sampfrom
        self._len = sampto - sampfrom

    def __len__(self):
        return self._len

    def __getitem__(self, key):
        if isinstance(key, slice):
            (start, stop, step) = key.indices(self._len)
            data = _read_2d_signals(self._record,
                                    start + self._base,
                                    stop + self._base)
            return data[::step]
        elif isinstance(key, tuple):
            return self[key[0]][key[1:]]
        elif hasattr(key, '__index__'):
            return self[key:key+1][0]
        else:
            raise TypeError

@tompollard
Copy link
Member

Just catching up on this!

Let's not worry about backwards compatibility here. The current API behavior is not ideal, and does not allow the user to get a predictable output.

What about deprecating rdrecord and introducing wfdb.read_record() or maybe just wfdb.read()?

What do people think of the suggestion of having two params: return_dims and smooth_frames? return_dims will apply for both single and multi-frequency signals, and smooth_frames will apply only to multi-frequency signals.

Is return_dims necessary at all? The fact that it leads to situations like (2, False): Illegal combination. isn't nice. Could we treat everything as 2d and maybe have some kind of binary flatten flag if necessary?

@cx1111
Copy link
Member Author

cx1111 commented Jun 7, 2022

What about deprecating rdrecord and introducing wfdb.read_record() or maybe just wfdb.read()?

That's the long term plan. We want a major version where old and new APIs are available, and then another one that removes the old ones.

Is return_dims necessary at all? The fact that it leads to situations like (2, False): Illegal combination. isn't nice. Could we treat everything as 2d and maybe have some kind of binary flatten flag if necessary?

flatten essentially does the same thing as return_dims and the illegal combination issue would still hold. I'd argue that it's a worse keyword to use, since flattening an array may mean different things, thus making the keyword less explicit.

@cx1111
Copy link
Member Author

cx1111 commented Jun 7, 2022

Hot off the press: #388

See the top level functions at the bottom of the module demonstrating a non-streaming and streaming (needs a lot of work) API. My biggest takeaway while playing around with this: Why does any class/object need to have the array as one of its attribute? Does it benefit the library, or the user? If we just separate the full data array from the class/object, we can remove the p_signal/d_signal/e_p_signal naming garbage.

@cx1111
Copy link
Member Author

cx1111 commented Jun 23, 2022

^^ Thinking about it more, separating the signal data from the object may bring more confusion than benefits. Imagine reading a JPEG file into an Image object, and not having the bytes as an object field. That would be very strange.

@wm5775
Copy link

wm5775 commented Mar 16, 2025

I agree with cx1111's opinion.
At the end, I will show an example of our header, but the current WFDB library exhibits inappropriate behavior with multi-frequency records.

Record Header

record_name : 07954_20220806000757
number_of_signals : 552
sampling_frequency : 0.976562 -> Set the basic sampling period to 1024 msec
number_of_samples_per_signal : 3512 -> Basic sampling period is 3512 times

Signal Header -> 3rd signal: ECG II

filename : 07954_20220806000757_WAV_0101.dat 
format : 61 
sample_per_frame : 256 -> 256 times the basic sampling period of 1024 msec -> 250 samples/sec

In this case, ECG II should return 3512 x 256 = 899072 samples as signal_length, but in reality, it only returns 3512 samples. The signal length for signals after the 11th signal is also 3512.
Thus, even if the signal lengths differ by signal, returning them in an MxN array is unnecessary. I believe the library's specifications should clearly state that there is a possibility of differing signal lengths by signal.
Currently, there is no interface to obtain the signal length for each signal.

Example of Our Header

07954_20220806000757 552 0.976562 3517 00:09:39 06/08/2022
#488 msec
07954_20220806000757_QRS.dat 61x256 1/undef 16 0 0 4650 0 QRS_SYNC 
07954_20220806000757_PCG.dat 61x256 1/undef 16 0 0 0 0 PACING 
07954_20220806000757_WAV_0101.dat 61x256 500/mV 16 0 0 -27205 0 0101,II 
07954_20220806000757_WAV_0700.dat 61x64 1638/% 16 0 0 -23932 0 0700,SpO2 
07954_20220806000757_WAV_0500.dat 61x128 400/OHM 16 0 0 18289 0 0500,RESP(IMP) 
07954_20220806000757_WAV_4900.dat 61x128 10/mmHg 16 0 0 -12891 0 4900,CO2 
07954_20220806000757_WAV_5800.dat 61x128 0/undef 16 0 0 -20637 0 5800,CO2(VENT) 
07954_20220806000757_WAV_5801.dat 61x128 1/mL 16 0 0 -25316 0 5801,FLOW(VENT) 
07954_20220806000757_WAV_5802.dat 61x128 100/cmH₂O 16 0 0 -27772 0 5802,Paw(VENT) 
07954_20220806000757_WAV_2C00.dat 61x128 800/mmHg 16 0 0 9668 0 2C00,ART 
07954_20220806000757_STS.dat 61x1 1/undef 16 0 0 17597 0 STATUS_MESSAGE[0] 
07954_20220806000757_STS.dat 61x1 1/undef 16 0 0 6656 0 STATUS_MESSAGE[1] 
07954_20220806000757_STS.dat 61x1 1/undef 16 0 0 0 0 STATUS_MESSAGE[2] 
07954_20220806000757_STS.dat 61x1 1/undef 16 0 0 0 0 STATUS_MESSAGE[3] 
07954_20220806000757_STS.dat 61x1 1/undef 16 0 0 0 0 STATUS_MESSAGE[4] 
07954_20220806000757_STS.dat 61x1 1/undef 16 0 0 0 0 STATUS_MESSAGE[5] 
07954_20220806000757_STS.dat 61x1 1/undef 16 0 0 0 0 STATUS_MESSAGE[6] 
... snip ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants