Skip to content

Stereo LET metadata/LET upgrade function#89

Open
ZigongXu wants to merge 3 commits intoserpentine-h2020:mainfrom
ZigongXu:STEREO_LET
Open

Stereo LET metadata/LET upgrade function#89
ZigongXu wants to merge 3 commits intoserpentine-h2020:mainfrom
ZigongXu:STEREO_LET

Conversation

@ZigongXu
Copy link

@ZigongXu ZigongXu commented Sep 1, 2025

The metadata of LET elements are added, including all unsec elements and sec data of protons, since in most cases only protons are used. The unsec data product is the standard data product. summed data products have been implemented.
Also, since LET has multiple elements, I added a stereo.let() function to return the data of different elements. usage see the help header of the function. These functions have been tested on my side, and should work well

@jgieseler
Copy link
Member

jgieseler commented Sep 3, 2025

Hi @ZigongXu 👋, thanks a lot for your contribution! I'm separating my comments to the metadata part and the functions let() & het().

metadata

  • I think there is a typo in the manually defined sectored energy bins ({'H_Hi_sec_flux_Energy_Bins': [6,12]} should be [6,10] based on its CATDESC). To circumvent this, I'd suggest to not hard-code the energy values. So replace
for var in sec_name:
    metadata.update({var: cdf.varattsget(var)['CATDESC']})
    metadata.update({var+'_UNITS': cdf.varattsget(var)['UNITS']})
    metadata.update({var+'_FILLVAL': cdf.varattsget(var)['FILLVAL']})
metadata.update({'H_Lo_sec_flux_Energy_Bins': [4,6]})
metadata.update({'H_Hi_sec_flux_Energy_Bins': [6,12]})
metadata.update({'H_VLo_sec_flux_Energy_Bins': [1.8,3.6]})
metadata.update({'H_Lo_sec_cnts_Energy_Bins': [4,6]})
metadata.update({'H_Hi_sec_cnts_Energy_Bins': [6,12]})
metadata.update({'H_VLo_sec_cnts_Energy_Bins': [1.8,3.6]})
metadata.update({'H_sec_flux_Energy_Bins':[[1.8,3.6],[4,6],[6,12]]})

with

for var in sec_name:
    metadata.update({var: cdf.varattsget(var)['CATDESC']})
    metadata.update({var+'_UNITS': cdf.varattsget(var)['UNITS']})
    metadata.update({var+'_FILLVAL': cdf.varattsget(var)['FILLVAL']})
    metadata.update({var+'_Energy_Bins': [float(cdf.varattsget(var)['CATDESC'].split(' ')[-4].replace(',', '.')), 
                                          float(cdf.varattsget(var)['CATDESC'].split(' ')[-2].replace(',', '.')) ]})
H_sec_flux_Energy_Bins = []
for flux_var in ['H_VLo_sec_flux', 'H_Lo_sec_flux', 'H_Hi_sec_flux']:
    H_sec_flux_Energy_Bins = H_sec_flux_Energy_Bins + [metadata[flux_var+'_Energy_Bins']]
metadata.update({'H_sec_flux_Energy_Bins': H_sec_flux_Energy_Bins})

(The .replace(',', '.') part is necessary because some of the LET metadata has erroneous decimal commas in it.)

  • You're producing metadata['H_unsec_flux_Bins_Text'] as an array of strings, but metadata['H_sec_flux_Energy_Bins'] as a list of floats. I think they should have uniform names and types.

  • I'm not sure if we want to restrict the used data at

#sec data, only load proton data, others are not very commonly used
sec_name = ['H_Lo_sec_flux', 'H_Hi_sec_flux', 'H_VLo_sec_flux', 'H_Lo_sec_cnts', 'H_Hi_sec_cnts', 'H_VLo_sec_cnts']

I think it would be worthwhile to just put in the info in for all the data.

let() and het()

I'm a bit hesitant about these functions because they are only filtering the already available DataFrame and metadata, and we don't have this kind of filtering functions elsewhere in seppy. On the other hand, it's no harm to have them. But if we do want to add them, there are some (partly critical) things that would need to be fixed (commented below in the review). For example, for me the code is not working at all because the df.attrs part is crashing for me. I'm not sure if it's in the end worth the effort to polish these functions.

@serpentine-h2020 serpentine-h2020 deleted a comment from codecov bot Sep 3, 2025
@ZigongXu
Copy link
Author

ZigongXu commented Sep 3, 2025

Thanks for your time reviewing the code. That is my first time working on github with others. So apologize for extra trouble I caused. Here are my reply to your comment.,

  1. The energy range of the Hi_sec_flux: that is very interesting. I didn't check the CDF header for its energy. Based on the https://izw1.caltech.edu/STEREO/DATA/Level1/Public/ahead/1Minute/2025/Sectored/He4_hi/, and the file header of those file, the energy is 6-12 MeV. So I will ask Christina for another double check and confirm the energy level for this data product. Obviously, the CDF file and the Caltech published file have a conflict. Will let you know later.
  2. The reason that I only list the proton sector data, as I explained, the proton might be the only useful sector data ( I will double confirm this). We have no reason to load useless data at all.
  3. The LET and HET are just my type of function, which might be convenient for data usage. The current data product has too many columns and their names also hard to connect to the energy. If you think there is no need to add, I am fine with this. Regarding the df.attrs, I am not sure why your code crashed. It seems like a common feature of pandas. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.attrs.html

@jgieseler
Copy link
Member

jgieseler commented Sep 4, 2025

No problem, and no need to apologize! As I said, I'm really glad to get some participation. :-)

  1. Good that you can check the energy values at the source!
  2. Yes, if the other data is not really useful, that makes sense. I just made the mistake when initially writing some loader for SolO/EPD that I only manually defined a selection of columns to load. Afterwards I regretted it, and now I try not to make the same mistake twice. ;-)
  3. As I said, when they are working, we can also include them. But the .attrs thing is a problem, especially when it's not working with the latest Pandas version. I also do not see immediately what the problem is. Because the function is marked as experimental, I'm not sure if it makes sense to spend too much effort on it at the moment (it might break again with every new release of Pandas). So maybe for now it would be better to return a DataFrame and a separate dictionary.

@jgieseler
Copy link
Member

jgieseler: I think there is a typo in the manually defined sectored energy bins ({'H_Hi_sec_flux_Energy_Bins': [6,12]} should be [6,10] based on its CATDESC). To circumvent this, I'd suggest to not hard-code the energy values.

ZigongXu: The energy range of the Hi_sec_flux: that is very interesting. I didn't check the CDF header for its energy. Based on the izw1.caltech.edu/STEREO/DATA/Level1/Public/ahead/1Minute/2025/Sectored/He4_hi, and the file header of those file, the energy is 6-12 MeV.

Note that the file you linked is the one for He4; the one for H (e.g. H_hi_sectored_ahead_2025_001_level1_11.txt) says "16 sectored proton intensities, 6 - 10 MeV (1/(cm^2 s sr MeV/nuc))"

@ZigongXu
Copy link
Author

jgieseler: I think there is a typo in the manually defined sectored energy bins ({'H_Hi_sec_flux_Energy_Bins': [6,12]} should be [6,10] based on its CATDESC). To circumvent this, I'd suggest to not hard-code the energy values.

ZigongXu: The energy range of the Hi_sec_flux: that is very interesting. I didn't check the CDF header for its energy. Based on the izw1.caltech.edu/STEREO/DATA/Level1/Public/ahead/1Minute/2025/Sectored/He4_hi, and the file header of those file, the energy is 6-12 MeV.

Note that the file you linked is the one for He4; the one for H (e.g. H_hi_sectored_ahead_2025_001_level1_11.txt) says "16 sectored proton intensities, 6 - 10 MeV (1/(cm^2 s sr MeV/nuc))"

Yes! You are correct!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants