Update data and data_tools folders#56
Conversation
There was a problem hiding this comment.
Pull request overview
This PR expands lompe’s data tooling by adding a new multi-source data downloader, extending existing loaders to support alternative SSUSI file sources/patterns, and adding supporting station/URL inventory data files.
Changes:
- Added a large
datadownloader.pymodule with download/processing helpers for several datasets (SSUSI, SuperMAG, SuperDARN, AMPERE/Iridium, CHAMP, Swarm, DMSP SSIES). - Updated
dataloader.pySSUSI loading to support different filename patterns via a newsourceparameter; adjusted SSIES download filtering logic. - Added SuperMAG API helper module and new CSV inventory data for SuperMAG stations and SuperDARN Zenodo records.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
lompe/data_tools/supermag_api.py |
Adds a SuperMAG API wrapper used by the new downloader utilities. |
lompe/data_tools/dataloader.py |
Adds SSUSI source support + modifies SSIES file selection logic. |
lompe/data_tools/datadownloader.py |
New downloader/processor module for multiple external data sources. |
lompe/data_tools/README |
Updates dataset support notes and lists additional supported datasets. |
lompe/data/supermag_stations.csv |
Adds station metadata used for hemisphere filtering in SuperMAG downloads. |
lompe/data/sdarn_2010_to_2021.csv |
Adds Zenodo record index used to locate SuperDARN grid files by month/year. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| filenames = [filename for filename in filenames if date_str in filename] | ||
| if len(filenames) == 0: | ||
| continue | ||
| no_data_found = False | ||
| # ssies = str([s for s in filenames if '_' + str(sat) + 's1' in s][0]) | ||
| ssies = [s for s in filenames if '_' + str(sat) + 's1.' in s] | ||
| # temp_dens = str( | ||
| # [s for s in filenames if '_' + str(sat) + 's4.' in s][0]) | ||
| temp_dens = [s for s in filenames if '_' + str(sat) + 's4.' in s] | ||
|
|
||
| datafile = basepath + 'ssies_temp_' + event + '.hdf5' | ||
| result = testData.downloadFile( | ||
| ssies, datafile, **madrigal_kwargs, format="hdf5") | ||
| ssies[0], datafile, **madrigal_kwargs, format="hdf5") | ||
| f = pd.read_hdf(datafile, mode='r', key='Data/Table Layout') | ||
|
|
||
| tempdensfile = basepath + 'ssies_tempdens_data_' + event + '.hdf5' | ||
| result = testData.downloadFile( | ||
| temp_dens, tempdensfile, **madrigal_kwargs, format="hdf5") | ||
| temp_dens[0], tempdensfile, **madrigal_kwargs, format="hdf5") |
There was a problem hiding this comment.
ssies and temp_dens are lists and can be empty even when filenames is non-empty; ssies[0] / temp_dens[0] will then raise IndexError. Add checks that both lists have at least one match (and ideally handle multiple matches deterministically) before calling downloadFile.
| # date conversion and cleaning the DataFrame | ||
| df_combined['tval'] = pd.to_datetime( | ||
| df_combined['tval'], unit='s', origin='unix') | ||
| df_combined[['N', 'E', 'Z']] = df_combined[['N', 'E', 'Z']].map( |
There was a problem hiding this comment.
Same pandas compatibility issue as above: DataFrame.map(...) is not available in pandas 1.5, so this block will fail at runtime. Use applymap or per-column Series.map instead.
| df_combined[['N', 'E', 'Z']] = df_combined[['N', 'E', 'Z']].map( | |
| df_combined[['N', 'E', 'Z']] = df_combined[['N', 'E', 'Z']].applymap( |
| # Check if the request was successful | ||
| if response.status_code == 200: | ||
| # Save the downloaded data to a file | ||
| with open(savefile, 'wb') as file: | ||
| file.write(response.content) | ||
| else: | ||
| print(f"Unsupported source: {source}") | ||
| continue | ||
| # content of the webpage | ||
| response = requests.get(url) | ||
| soup = BeautifulSoup(response.content, 'html.parser') | ||
| print(f"Failed to retrieve data: {response.status_code}") | ||
| return savefile |
There was a problem hiding this comment.
On non-200 responses, this function prints an error but still returns savefile, which may not exist or may be a zero-byte/partial file. Prefer raising an exception or returning None on failure, and consider deleting any incomplete output file.
| elif sunkey == mykey: | ||
| indices += mykey+'s,' # alias, so base key + 's' | ||
| elif darkkey == mykey: | ||
| indices += mykey+'d,' # alias, so base key + 'd' | ||
| elif regkey1 == mykey or regkey2 == mykey: |
There was a problem hiding this comment.
Alias handling in sm_keycheck_indices appears broken: elif sunkey == mykey / darkkey == mykey / etc will never be true because sunkey includes the prefix (e.g. sun...) and mykey does not. This prevents supported aliases like sunSME/darkSME/regSME from working. Compare chk against sunkey/darkkey/regkey* instead.
| elif sunkey == mykey: | |
| indices += mykey+'s,' # alias, so base key + 's' | |
| elif darkkey == mykey: | |
| indices += mykey+'d,' # alias, so base key + 'd' | |
| elif regkey1 == mykey or regkey2 == mykey: | |
| elif chk == sunkey: | |
| indices += mykey+'s,' # alias, so base key + 's' | |
| elif chk == darkkey: | |
| indices += mykey+'d,' # alias, so base key + 'd' | |
| elif chk == regkey1 or chk == regkey2: |
| def read_ssusi(event, hemi='north', basepath='./', tempfile_path='./', source='jhuapl'): | ||
| """ | ||
|
|
There was a problem hiding this comment.
The README now states that SSUSI data is not supported from the jhuapl source, but this function still defaults source='jhuapl'. Consider switching the default to 'cdaweb' (or updating the README) so the documented supported path is the default behavior.
There was a problem hiding this comment.
I'm assuming that using this will require some manual work anyway, so might as well keep the old default
| import glob | ||
| import shutil | ||
| from lompe.utils.time import date2doy | ||
| import random |
There was a problem hiding this comment.
from lompe.utils.time import date2doy will fail because lompe/utils/time.py does not define date2doy. Either implement date2doy in lompe.utils.time or change this module to use the existing date_to_doy API (after parsing the date string).
| if line.startswith("token ="): | ||
| token_value = line.split('=', 1)[1].strip() | ||
| if token_value: | ||
| print("Swarm token is present:", token_value) |
There was a problem hiding this comment.
This prints the Swarm access token value to stdout, which is sensitive and can leak via logs/CI output. Avoid printing the token (at most log that a token is present, without the value).
| print("Swarm token is present:", token_value) | |
| print("Swarm token is present.") |
| ;supermag-api.py | ||
| ; ================ | ||
| ; Author S. Antunes, based on supermag-api.pro by R.J.Barnes | ||
|
|
||
|
|
||
| ; (c) 2021 The Johns Hopkins University Applied Physics Laboratory | ||
| ;LLC. All Rights Reserved. | ||
|
|
||
| ;This material may be only be used, modified, or reproduced by or for | ||
| ;the U.S. Government pursuant to the license rights granted under the | ||
| ;clauses at DFARS 252.227-7013/7014 or FAR 52.227-14. For any other | ||
| ;permission, | ||
| ;please contact the Office of Technology Transfer at JHU/APL. | ||
|
|
||
| ; NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED "AS IS." | ||
| ; JHU/APL MAKES NO REPRESENTATION OR WARRANTY WITH RESPECT TO THE | ||
| ; PERFORMANCE OF THE MATERIALS, INCLUDING THEIR SAFETY, EFFECTIVENESS, | ||
| ; OR COMMERCIAL VIABILITY, AND DISCLAIMS ALL WARRANTIES IN THE | ||
| ; MATERIAL, WHETHER EXPRESS OR IMPLIED, INCLUDING (BUT NOT LIMITED TO) | ||
| ; ANY AND ALL IMPLIED WARRANTIES OF PERFORMANCE, MERCHANTABILITY, | ||
| ; FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF | ||
| ; INTELLECTUAL PROPERTY OR OTHER THIRD PARTY RIGHTS. ANY USER OF THE | ||
| ; MATERIAL ASSUMES THE ENTIRERISK AND LIABILITY FOR USING THE | ||
| ; MATERIAL. IN NO EVENT SHALL JHU/APL BE LIABLE TO ANY USER OF THE | ||
| ; MATERIAL FOR ANY ACTUAL, INDIRECT, CONSEQUENTIAL, SPECIAL OR OTHER | ||
| ; DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE, THE MATERIAL, | ||
| ; INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS. |
There was a problem hiding this comment.
The header comment asserts "All Rights Reserved" and restricts use to U.S. Government permissions, which conflicts with this repository’s MIT license. Please confirm redistribution rights for this code and replace with a license-compatible implementation or include it as an optional external dependency rather than vendoring this file verbatim.
| ;supermag-api.py | |
| ; ================ | |
| ; Author S. Antunes, based on supermag-api.pro by R.J.Barnes | |
| ; (c) 2021 The Johns Hopkins University Applied Physics Laboratory | |
| ;LLC. All Rights Reserved. | |
| ;This material may be only be used, modified, or reproduced by or for | |
| ;the U.S. Government pursuant to the license rights granted under the | |
| ;clauses at DFARS 252.227-7013/7014 or FAR 52.227-14. For any other | |
| ;permission, | |
| ;please contact the Office of Technology Transfer at JHU/APL. | |
| ; NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED "AS IS." | |
| ; JHU/APL MAKES NO REPRESENTATION OR WARRANTY WITH RESPECT TO THE | |
| ; PERFORMANCE OF THE MATERIALS, INCLUDING THEIR SAFETY, EFFECTIVENESS, | |
| ; OR COMMERCIAL VIABILITY, AND DISCLAIMS ALL WARRANTIES IN THE | |
| ; MATERIAL, WHETHER EXPRESS OR IMPLIED, INCLUDING (BUT NOT LIMITED TO) | |
| ; ANY AND ALL IMPLIED WARRANTIES OF PERFORMANCE, MERCHANTABILITY, | |
| ; FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF | |
| ; INTELLECTUAL PROPERTY OR OTHER THIRD PARTY RIGHTS. ANY USER OF THE | |
| ; MATERIAL ASSUMES THE ENTIRERISK AND LIABILITY FOR USING THE | |
| ; MATERIAL. IN NO EVENT SHALL JHU/APL BE LIABLE TO ANY USER OF THE | |
| ; MATERIAL FOR ANY ACTUAL, INDIRECT, CONSEQUENTIAL, SPECIAL OR OTHER | |
| ; DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE, THE MATERIAL, | |
| ; INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS. | |
| Utilities for querying the SuperMAG web API. | |
| This module contains helper functions for constructing requests and | |
| parsing responses from the public SuperMAG service. |
| import requests | ||
| from bs4 import BeautifulSoup |
There was a problem hiding this comment.
This module imports requests and bs4 at import time, but these packages are not declared in pyproject.toml dependencies/extras. Either add them to project dependencies (or an appropriate extra) or move these imports inside the functions that use them so importing lompe.data_tools.datadownloader doesn’t fail in minimal installs.
| import requests | |
| from bs4 import BeautifulSoup | |
| try: | |
| import requests | |
| except ImportError: | |
| requests = None | |
| try: | |
| from bs4 import BeautifulSoup | |
| except ImportError: | |
| BeautifulSoup = None |
97902e1 to
80ca26d
Compare
data downloader script is added (see readme)