Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find more efficient resampler #405

Open
MarionBWeinzierl opened this issue Jan 30, 2025 · 7 comments
Open

Find more efficient resampler #405

MarionBWeinzierl opened this issue Jan 30, 2025 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@MarionBWeinzierl
Copy link
Collaborator

MarionBWeinzierl commented Jan 30, 2025

Working on #348, there is resampling necessary between half-hourly, hourly, a daily data. As @davidorme notes here: https://github.com/ImperialCollegeLondon/pyrealm/blob/88ed9a14b18f0e4ddf3302baa88ec7164c479a86/pyrealm_build_data/LAI_in_pyrealm/fapar_limitation_example.py#L157C1-L160C52 , the standard resampler is very slow.

An overview of alternatives can be found here: http://signalsprocessed.blogspot.com/2016/08/audio-resampling-in-python.html , and https://pypi.org/project/samplerate/ seems to be a good candidate.

@MarionBWeinzierl MarionBWeinzierl converted this from a draft issue Jan 30, 2025
@MarionBWeinzierl MarionBWeinzierl self-assigned this Jan 30, 2025
@MarionBWeinzierl MarionBWeinzierl added the enhancement New feature or request label Jan 30, 2025
@davidorme
Copy link
Collaborator

Looks like samplerate requires a C++ library. That might be on our development trajectory anyway, but it's a significant step 😄

@MarionBWeinzierl
Copy link
Collaborator Author

It's a simple pip install, I just did it by adding it to the .toml file and doing a poetry update. However, I am not sure whether it's more than we need?

@davidorme
Copy link
Collaborator

Looks like it might need a compiler on Linux?

Anyway - in general, resampling in pyrealm is either going to be:

  • Averaging a bunch of values to go from fast time scales to slow ones. That is being used in that build data example ) to go from half hour to daily. We also do the same thing in subdaily model to get the average conditions in the acclimation window, but IIRC we do that in numpy with some simple reshaping along axes, which is fast.
  • Interpolating a bunch of values to go from slow scales to faster ones. That's what is done in the SubdailyScaler to go from daily to subdaily value and that uses scipy.interpolate.

We aren't currently doing anything fancy with filters or smoothing and the like. I think in the build data example I was using something with a clean API, rather than a load of numpy shenanigans that would probably be far faster.

@MarionBWeinzierl
Copy link
Collaborator Author

MarionBWeinzierl commented Jan 30, 2025

get_window_values is doing the reshaping you mean, I believe. Is it worthwhile to extract this, and other relevant functions (like get_daily_means) out of scaler.py and make them available as core functions?

@davidorme
Copy link
Collaborator

It probably is, yes. I think we are always likely to be working with subdaily data at regular intervals that map neatly onto whole days (SubdailyPModel insists on this). So the reshaping and sampling there ought to be a generally useful and efficient tool?

@MarionBWeinzierl
Copy link
Collaborator Author

I created issue #406 for tackling this.

@MarionBWeinzierl
Copy link
Collaborator Author

For the second case, that just using interp1d in fill_daily_to_subdaily, I assume. But that's flagged as legacy here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html

@MarionBWeinzierl MarionBWeinzierl moved this from In Progress to Todo in ICCS development board Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

2 participants