Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can Nexradlevel2 reader read streamed data? #265

Open
aladinor opened this issue Feb 4, 2025 · 10 comments
Open

can Nexradlevel2 reader read streamed data? #265

aladinor opened this issue Feb 4, 2025 · 10 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@aladinor
Copy link
Member

aladinor commented Feb 4, 2025

Tracked also in #258 and #264

Xradar should support compress files as well as data streaming

pyart library allows to open in different formats including Bz, gz and files stored in an s3 bucket (data) streaming. I think we might want to implement a similar solution to xradar reader.

  • Create/adapt prepare_for_read
  • Adapt xradar backends to support data streaming as in PyArt
@aladinor aladinor changed the title Nexrad reader does not support .gz files neigther data streaming Nexradlevel2 reader does not compressed files nor data streaming Feb 4, 2025
@aladinor aladinor changed the title Nexradlevel2 reader does not compressed files nor data streaming Nexradlevel2 reader does not support compressed files nor data streaming Feb 4, 2025
@aladinor aladinor changed the title Nexradlevel2 reader does not support compressed files nor data streaming Nexradlevel2 reader does not support data streaming Feb 4, 2025
@aladinor aladinor added enhancement New feature or request help wanted Extra attention is needed labels Feb 4, 2025
@kmuehlbauer
Copy link
Collaborator

@aladinor Thanks for splitting this out. Thanks @ghiggi for raising this.

I'd strongly vote against doing something like this. Instead we should advertise users how to do this with boilerplate code.

Things work for PyArt because they import the whole file into memory. This is something we avoid with the current implementation trying to only read in necessary data.

@aladinor
Copy link
Member Author

aladinor commented Feb 9, 2025

Hi @kmuehlbauer,

Thanks for pointing that out. I totally agree that we should avoid loading the full file into memory and rely on numpy.memmap to access only the necessary data. However, I’m curious—how large do these files typically get for this to be a significant issue?

It seems like using numpy.memmap is a great approach for local files, but when dealing with remote data (e.g., NEXRAD files stored on AWS S3), it requires downloading the entire file first. Streaming might be a useful alternative here, even if it comes with some trade-offs. For instance, the Sigmet backend supports streaming, and it works well in similar cases.

Since most NEXRAD data is stored in S3, it might be worth considering an optional streaming implementation.

Please let me know your thoughts

@kmuehlbauer
Copy link
Collaborator

So "support streaming" should just mean "can read streamed data"? Then we might just adapt this feature from IRIS to NEXRAD to make this happen.

What I'm voting against is to use gz/s3fs/fsspec/etc inside our implementation.

@aladinor
Copy link
Member Author

aladinor commented Feb 9, 2025

Hi @kmuehlbauer,

Thanks for the clarification, and my apologies for the confusion—that was entirely my misunderstanding! You're absolutely right. By "support streaming," we’re aiming for the ability to read streamed data, not necessarily to integrate gz, s3fs, or fsspec directly into the core implementation.

Given that, adapting the streaming capabilities from the IRIS reader for NEXRAD sounds like a great idea. I’ll take a closer look at how we can make that happen without adding unnecessary dependencies.

Thanks again for your patience and guidance on this!

@aladinor aladinor changed the title Nexradlevel2 reader does not support data streaming cand Nexradlevel2 reader read streamed data? Feb 9, 2025
@aladinor aladinor changed the title cand Nexradlevel2 reader read streamed data? can Nexradlevel2 reader read streamed data? Feb 9, 2025
@kmuehlbauer
Copy link
Collaborator

@aladinor Great! These additions to the first IRIS reader have been made when it still was a part of wradlib. I can look up the commits and link them here, if necessary.

@aladinor
Copy link
Member Author

aladinor commented Feb 9, 2025

Hi @kmuehlbauer,

That sounds awesome! I'd be happy to work on adapting the IRIS streaming feature for NEXRAD. If you could link those commits, it would definitely be helpful as a reference!

@aladinor aladinor self-assigned this Feb 9, 2025
@kmuehlbauer
Copy link
Collaborator

@aladinor These are the relevant wradlib PR's:

@ghiggi
Copy link

ghiggi commented Feb 10, 2025

Hi guys !

I just wanted to mention that we could eventually adding the capability to directly read compressed formats and open files from S3 paths into the radar_api wrappers.

Right now, when the radar_api detects an S3 path, it uses fsspec’s simplecache to temporarily download the file to disk and then opens it as if it were a local file.
I have noticed that if there are too many chunks or too many simultaneous connections to S3, the latency involved in connecting to the bucket can make reading or streaming even a small portion of the file slower than if we were to download the entire file first and then open it from local disk.
So keep in mind the amount of overhead caused by S3 connection latency in your design choices.

@kmuehlbauer
Copy link
Collaborator

Thanks @ghiggi for letting us know. That behaviour is another indication that xradar should not put too much magic into file readers, but let the user decide how to claim the data. xradar should just be capable to consume any of these files/streams.

@ghiggi
Copy link

ghiggi commented Feb 11, 2025

I just discovered now about obstore - a fsspec reimplemented in rust that seems to allow much faster data streaming. They are also merging the ObStore into Zarr (zarr-developers/zarr-python#1661). Might be worth to support also this type of file objects.

(Maybe @aladinor might want to also check the speed improvement for IRIS/Segmet files 😄)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Development

No branches or pull requests

3 participants