Skip to content

Bug in solar_flare_forcasting/dataset.py: incorrect label source and unintended index merging #40

@JinsuHongg

Description

@JinsuHongg

File: Surya/downstream_examples/solar_flare_forcasting/dataset.py

Location:Method _get_index_data

Issue 1 — Incorrect label lookup

In _get_index_data, the label is currently retrieved using:
data["label"] = self.index.loc[reference_timestamp, "label_max"]

This line should be corrected to:
data["label"] = self.flare_index.loc[reference_timestamp, "label_max"]

Issue 2 — Unintended modification of self.index

In the constructor, line ~63, the code merges the flare index into the base index:
self.index = self.index.join(self.flare_index, how="inner", validate="one_to_one")

This operation should not modify self.index, because:
The merge removes timestamps that are missing from self.flare_index.

For sequential SDO input timestamps (e.g., [-60, 0, 60] minutes relative to reference), this breaks the dataset loader.
When a timestamp required for an input sequence is absent in self.flare_index, the merged index drops it, leading to missing files and invalid sample sequences.

Therefore, this line should be removed entirely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions