Skip to content

Update data and data_tools folders#56

Closed
FasilGibdaw wants to merge 3 commits into
klaundal:mainfrom
FasilGibdaw:update-data-folders
Closed

Update data and data_tools folders#56
FasilGibdaw wants to merge 3 commits into
klaundal:mainfrom
FasilGibdaw:update-data-folders

Conversation

@FasilGibdaw
Copy link
Copy Markdown
Contributor

data downloader script is added (see readme)

Copilot AI review requested due to automatic review settings April 24, 2026 16:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands lompe’s data tooling by adding a new multi-source data downloader, extending existing loaders to support alternative SSUSI file sources/patterns, and adding supporting station/URL inventory data files.

Changes:

  • Added a large datadownloader.py module with download/processing helpers for several datasets (SSUSI, SuperMAG, SuperDARN, AMPERE/Iridium, CHAMP, Swarm, DMSP SSIES).
  • Updated dataloader.py SSUSI loading to support different filename patterns via a new source parameter; adjusted SSIES download filtering logic.
  • Added SuperMAG API helper module and new CSV inventory data for SuperMAG stations and SuperDARN Zenodo records.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
lompe/data_tools/supermag_api.py Adds a SuperMAG API wrapper used by the new downloader utilities.
lompe/data_tools/dataloader.py Adds SSUSI source support + modifies SSIES file selection logic.
lompe/data_tools/datadownloader.py New downloader/processor module for multiple external data sources.
lompe/data_tools/README Updates dataset support notes and lists additional supported datasets.
lompe/data/supermag_stations.csv Adds station metadata used for hemisphere filtering in SuperMAG downloads.
lompe/data/sdarn_2010_to_2021.csv Adds Zenodo record index used to locate SuperDARN grid files by month/year.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lompe/data_tools/dataloader.py Outdated
Comment on lines +324 to +341
filenames = [filename for filename in filenames if date_str in filename]
if len(filenames) == 0:
continue
no_data_found = False
# ssies = str([s for s in filenames if '_' + str(sat) + 's1' in s][0])
ssies = [s for s in filenames if '_' + str(sat) + 's1.' in s]
# temp_dens = str(
# [s for s in filenames if '_' + str(sat) + 's4.' in s][0])
temp_dens = [s for s in filenames if '_' + str(sat) + 's4.' in s]

datafile = basepath + 'ssies_temp_' + event + '.hdf5'
result = testData.downloadFile(
ssies, datafile, **madrigal_kwargs, format="hdf5")
ssies[0], datafile, **madrigal_kwargs, format="hdf5")
f = pd.read_hdf(datafile, mode='r', key='Data/Table Layout')

tempdensfile = basepath + 'ssies_tempdens_data_' + event + '.hdf5'
result = testData.downloadFile(
temp_dens, tempdensfile, **madrigal_kwargs, format="hdf5")
temp_dens[0], tempdensfile, **madrigal_kwargs, format="hdf5")
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssies and temp_dens are lists and can be empty even when filenames is non-empty; ssies[0] / temp_dens[0] will then raise IndexError. Add checks that both lists have at least one match (and ideally handle multiple matches deterministically) before calling downloadFile.

Copilot uses AI. Check for mistakes.
Comment thread lompe/data_tools/datadownloader.py Outdated
# date conversion and cleaning the DataFrame
df_combined['tval'] = pd.to_datetime(
df_combined['tval'], unit='s', origin='unix')
df_combined[['N', 'E', 'Z']] = df_combined[['N', 'E', 'Z']].map(
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pandas compatibility issue as above: DataFrame.map(...) is not available in pandas 1.5, so this block will fail at runtime. Use applymap or per-column Series.map instead.

Suggested change
df_combined[['N', 'E', 'Z']] = df_combined[['N', 'E', 'Z']].map(
df_combined[['N', 'E', 'Z']] = df_combined[['N', 'E', 'Z']].applymap(

Copilot uses AI. Check for mistakes.
Comment thread lompe/data_tools/datadownloader.py Outdated
Comment on lines +641 to +648
# Check if the request was successful
if response.status_code == 200:
# Save the downloaded data to a file
with open(savefile, 'wb') as file:
file.write(response.content)
else:
print(f"Unsupported source: {source}")
continue
# content of the webpage
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
print(f"Failed to retrieve data: {response.status_code}")
return savefile
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On non-200 responses, this function prints an error but still returns savefile, which may not exist or may be a zero-byte/partial file. Prefer raising an exception or returning None on failure, and consider deleting any incomplete output file.

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +210
elif sunkey == mykey:
indices += mykey+'s,' # alias, so base key + 's'
elif darkkey == mykey:
indices += mykey+'d,' # alias, so base key + 'd'
elif regkey1 == mykey or regkey2 == mykey:
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias handling in sm_keycheck_indices appears broken: elif sunkey == mykey / darkkey == mykey / etc will never be true because sunkey includes the prefix (e.g. sun...) and mykey does not. This prevents supported aliases like sunSME/darkSME/regSME from working. Compare chk against sunkey/darkkey/regkey* instead.

Suggested change
elif sunkey == mykey:
indices += mykey+'s,' # alias, so base key + 's'
elif darkkey == mykey:
indices += mykey+'d,' # alias, so base key + 'd'
elif regkey1 == mykey or regkey2 == mykey:
elif chk == sunkey:
indices += mykey+'s,' # alias, so base key + 's'
elif chk == darkkey:
indices += mykey+'d,' # alias, so base key + 'd'
elif chk == regkey1 or chk == regkey2:

Copilot uses AI. Check for mistakes.
Comment on lines +31 to 33
def read_ssusi(event, hemi='north', basepath='./', tempfile_path='./', source='jhuapl'):
"""

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README now states that SSUSI data is not supported from the jhuapl source, but this function still defaults source='jhuapl'. Consider switching the default to 'cdaweb' (or updating the README) so the documented supported path is the default behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that using this will require some manual work anyway, so might as well keep the old default

Comment thread lompe/data_tools/dataloader.py
Comment thread lompe/data_tools/datadownloader.py Outdated
Comment on lines +9 to +12
import glob
import shutil
from lompe.utils.time import date2doy
import random
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from lompe.utils.time import date2doy will fail because lompe/utils/time.py does not define date2doy. Either implement date2doy in lompe.utils.time or change this module to use the existing date_to_doy API (after parsing the date string).

Copilot uses AI. Check for mistakes.
Comment thread lompe/data_tools/datadownloader.py Outdated
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
if token_value:
print("Swarm token is present:", token_value)
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prints the Swarm access token value to stdout, which is sensitive and can leak via logs/CI output. Avoid printing the token (at most log that a token is present, without the value).

Suggested change
print("Swarm token is present:", token_value)
print("Swarm token is present.")

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +44
;supermag-api.py
; ================
; Author S. Antunes, based on supermag-api.pro by R.J.Barnes


; (c) 2021 The Johns Hopkins University Applied Physics Laboratory
;LLC. All Rights Reserved.

;This material may be only be used, modified, or reproduced by or for
;the U.S. Government pursuant to the license rights granted under the
;clauses at DFARS 252.227-7013/7014 or FAR 52.227-14. For any other
;permission,
;please contact the Office of Technology Transfer at JHU/APL.

; NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED "AS IS."
; JHU/APL MAKES NO REPRESENTATION OR WARRANTY WITH RESPECT TO THE
; PERFORMANCE OF THE MATERIALS, INCLUDING THEIR SAFETY, EFFECTIVENESS,
; OR COMMERCIAL VIABILITY, AND DISCLAIMS ALL WARRANTIES IN THE
; MATERIAL, WHETHER EXPRESS OR IMPLIED, INCLUDING (BUT NOT LIMITED TO)
; ANY AND ALL IMPLIED WARRANTIES OF PERFORMANCE, MERCHANTABILITY,
; FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF
; INTELLECTUAL PROPERTY OR OTHER THIRD PARTY RIGHTS. ANY USER OF THE
; MATERIAL ASSUMES THE ENTIRERISK AND LIABILITY FOR USING THE
; MATERIAL. IN NO EVENT SHALL JHU/APL BE LIABLE TO ANY USER OF THE
; MATERIAL FOR ANY ACTUAL, INDIRECT, CONSEQUENTIAL, SPECIAL OR OTHER
; DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE, THE MATERIAL,
; INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment asserts "All Rights Reserved" and restricts use to U.S. Government permissions, which conflicts with this repository’s MIT license. Please confirm redistribution rights for this code and replace with a license-compatible implementation or include it as an optional external dependency rather than vendoring this file verbatim.

Suggested change
;supermag-api.py
; ================
; Author S. Antunes, based on supermag-api.pro by R.J.Barnes
; (c) 2021 The Johns Hopkins University Applied Physics Laboratory
;LLC. All Rights Reserved.
;This material may be only be used, modified, or reproduced by or for
;the U.S. Government pursuant to the license rights granted under the
;clauses at DFARS 252.227-7013/7014 or FAR 52.227-14. For any other
;permission,
;please contact the Office of Technology Transfer at JHU/APL.
; NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED "AS IS."
; JHU/APL MAKES NO REPRESENTATION OR WARRANTY WITH RESPECT TO THE
; PERFORMANCE OF THE MATERIALS, INCLUDING THEIR SAFETY, EFFECTIVENESS,
; OR COMMERCIAL VIABILITY, AND DISCLAIMS ALL WARRANTIES IN THE
; MATERIAL, WHETHER EXPRESS OR IMPLIED, INCLUDING (BUT NOT LIMITED TO)
; ANY AND ALL IMPLIED WARRANTIES OF PERFORMANCE, MERCHANTABILITY,
; FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF
; INTELLECTUAL PROPERTY OR OTHER THIRD PARTY RIGHTS. ANY USER OF THE
; MATERIAL ASSUMES THE ENTIRERISK AND LIABILITY FOR USING THE
; MATERIAL. IN NO EVENT SHALL JHU/APL BE LIABLE TO ANY USER OF THE
; MATERIAL FOR ANY ACTUAL, INDIRECT, CONSEQUENTIAL, SPECIAL OR OTHER
; DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE, THE MATERIAL,
; INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS.
Utilities for querying the SuperMAG web API.
This module contains helper functions for constructing requests and
parsing responses from the public SuperMAG service.

Copilot uses AI. Check for mistakes.
Comment on lines 5 to 6
import requests
from bs4 import BeautifulSoup
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module imports requests and bs4 at import time, but these packages are not declared in pyproject.toml dependencies/extras. Either add them to project dependencies (or an appropriate extra) or move these imports inside the functions that use them so importing lompe.data_tools.datadownloader doesn’t fail in minimal installs.

Suggested change
import requests
from bs4 import BeautifulSoup
try:
import requests
except ImportError:
requests = None
try:
from bs4 import BeautifulSoup
except ImportError:
BeautifulSoup = None

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants