Skip to content

Conversation

@wmwv
Copy link
Contributor

@wmwv wmwv commented Dec 18, 2025

Now Uses snappl for image searching, finding, and saving of DB candidates.

Saving of database candidates is a separate script as an afterburner to the pipeline.

Closes #25

@wmwv wmwv requested a review from a team as a code owner December 18, 2025 22:37
@github-actions github-actions bot added documentation Improvements or additions to documentation installation testing labels Dec 18, 2025
diaobject_provenance_tag: "nov2025_test3"
diaobject_process: "sidecar"

paths:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, paths should not be under photometry. If you look at the nov2025_container_config.yaml, it already has system.paths.images and system.paths.temp_dir ; ideally your code should just use that, and have no paths here.

Is dia_out_dir something that we're going to want to save to the database? Is it just temporary stuff? Or something in between?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something in between. In production, dia_out_dir is a tempdir. In development, it's helpful to have a persistence temporary directory.

Right, dia_out_dir is not something that we want to save to the database.

try:
diaobj.save_object()
except RuntimeError as e:
SNLogger.info(e)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to just give an info (not even an error) here?

Note that DiaObject.save_object will not save a new object if there exists an object of the given provenance within 1" (by default).

WAIT. Maybe it doesn't. Maybe it returns an error right now. Is that why you're just moving on when you get exceptions here? If so, we should put an issue into snappl to add an exists_ok flag or some such to DiaOjbect.save_object().

(What I'm worried about is some exception that comes from something other than the object already existing.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when a candidate that already exists to the database, you get a generic RuntimeError:

RuntimeError: Error response from server: Error, there already exists a provenance for tag nov2025_test4 and process sidecar

so I have to catch that broad error.

I share your worry about hiding some other error and would love to have a more specific error class returned. I have created a snappl issue:
Roman-Supernova-PIT/snappl#168

SIMS_DIR
+ "/RomanTDS/images/simple_model/{band}/{pointing}/Roman_TDS_simple_model_{band}_{pointing}_{sca}.fits.gz"
)
BASE_PATH = "/global/cfs/cdirs/lsst/shared/external/roman-desc-sims/Roman_data/" "RomanTDS/images/simple_model/"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is BASE_PATH used for? Obviously this is something that will need to go away and be replaced either by something that comes out of the config, or replaced by nothing because the database interface handles it in the background.

Same comment for INPUT_TRUTH_PATTERN below. Truth handling is a bit thorny.

Also, above, the environment variable SIMS_DIR -- what is that used for? That's also something that should go away, and should be handled by snappl in the background.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BASE_PATH is unused.

I would very much like to remove all of this separate truth-file handling in here. Both to make it more clear which parts are the fundamental subtraction and candidate identification vs which parts are really helpful right now for calculating efficiencies and purity.

The capability that I don't know how to do in snappl is to read the specific realized flux truth catalogs for each image for any transient or variable. These are the ".txt" files in the "RomanTDS/truth/..." hierarchy.

My understanding is that snappl will currently return the SNANA ideal truth lightcurve, but not the realized fluxes at the individual image catalog level. If I am wrong, could you point me to the relevant snappl function?

Another confusing thing I'm doing with the truth information is using the truth information for stars to reject known stars. We will have a star catalog available, so this is conceptually fair, but using the simulation truth files is not quite the representative thing to do. So we need a way to get a star catalog (in snappl?). This is actually easier for real data because we could always just send API call to online Gaia.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an issue to snappl to add a star catalog interface, and implement Gaia? I've done that before, so that's fast for me to implement.

Of course, Gaia doesn't go very deep, so it'll only get the brighter stars. Do you know of existing star catalogs out there that we might use? Legacy survey? I've got a thing set up for LS4 that I could put in pretty quickly as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and a good idea.
Roman-Supernova-PIT/snappl#169

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still (a) a need to use simulated star catalogs for OpenUniverse 2024 and other simulated surveys; and (b) wanting to have both nominal infinite SNR flux and realized flux for testing with simulations and source injection. So some of this awkwardness here will remain for a bit.

Although, maybe we should put this in snappl too. Getting the realized flux image-per-image is useful for all of the photometry routines for checking.

Roman-Supernova-PIT/snappl#170

args = parser.parse_args()
parser.add_argument("-t", "--temp-dir", type=str, default=None, help="Temporary directory.")
parser.add_argument("-o", "--output-dir", type=str, default=None, help="Output path")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you pass in the provenance tag and process for the saved DiaObjects? Is that hardcoded somewhere? I don't see a command-line option for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

....I see the arguments below in save_candidates_to_database.py.

What is the pattern for running sidecar? It looks like you have moire than one thing with a main. Does it read its own output files or some such?

(Coming into this, knowing nothing about sidecar, I would have guessed that there was just one executable, and saving candidates to the database was just another step that would be done after image subtraction and residual detection.)

Copy link
Contributor Author

@wmwv wmwv Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of how I ran save_candidates_to_database.py once all of the subtractions were done.

python sidecar/save_candidates_to_database.py \
    --image-collection snpitdb --image-provenance-tag ou2024 --image-process load_ou2024_image \
    --diaobject-provenance-tag nov2025_test4 --diaobject-process sidecar \
    --threshold 200 \
    --threshold-column peak_value \
    --data-records run_scripts/nov_R_images.csv \
    --output-dir /dia_out_dir

I just made up a provenance tag. I think this diaobject-provenance-tag connects to the other question you asked about the orchestration for what triggers sidecar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, yes, I agree now. I will keep save_candidates_to_database.py as a (clearly documented) alternative to load a bunch of candidates for testing.

But the standard call to loading to database will be in sidecar/pipeline.py with an option to control.

if "template_pointing" not in data_records.columns:
data_records = find_templates_for_pointings(
image_collection=image_collection,
science_pointing=data_records["science_pointing"],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking ahead -- in the database, pointing is being replaced with observation_id, and will be a string instead of an int.

It would probably be worth at some point thinking about the tools that decide what to subtract. Do you maintain the things-to-subtract data records by hand right now?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this whole section -- I haven't paid close enough attention to how sidecar works in general. I would have to figure out how all of this works in the "manually given images" vs. "images from the database" case.

help="Specify directory for output products."
)
parser.add_argument(
"--threshold",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things like threshold ought to be something that goes into the provenance. Consider (eventually) putting this in the config.

psf = config.getPSF_Image(size, x, y, **kwargs)
psf.write(str(psf_path))
def get_imsim_psf(x, y, pointing, sca, band, psf_type="ou24PSF", **kwargs):
"""Return PSF for image as a 2D numpy array. Will be of size of PSF model."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought is going to be required both here and in snappl to figure out how to get the right PSFs for images.

Ideally, each image subclass should have a way of telling you what kind of PSF to get. We might still want to be able to override that, at least for tests, but there should be defaults in there.

I don't think snappl supports this kind of thing yet.

In any event, I would rename this function, because it's not always going to get an imsim PSF any more.

)

image_list = image_collection.find_images(ra=ra, dec=dec, dbclient=dbclient)
entries = [(im.pointing, im.band, im.sca, im.exptime, im.mjd) for im in image_list]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that as of snappl 0.38, pointing is gone and replaced by the string observation_id

(You need snappl 0.39 for the database with ASDF files in it that I put up.)

@wmwv
Copy link
Contributor Author

wmwv commented Feb 5, 2026

Thanks for taking a look. Half the comment I can address easily and will do so.

@wmwv
Copy link
Contributor Author

wmwv commented Feb 5, 2026

The other half touch upon some development vs. production tension here.

I agree with the long-term in-production goals. I'm afraid that making running require going through all the steps, including saving to database will be a real hassle and impediment to fast-turnaround development. E.g., as we test DIASource saving, I don't want to redo pixel analysis every time, so I want to make it easy to just run the last "save-to-database" step, even though that does make it easy to get the provenance wrong. I would very much welcome suggestions for how to make it easier to just run a set of steps, or the last step in some easy and consistent way.

Right now it's a two-step process because all the things before the database can be run, re-run, tested without changing long-term state. Re-running overwrites previous images and catalogs, which is what I expect to happen. Once we start interacting with the database, though, a second run will not overwrite, it will fail.

Things being a two-step process is one part of why it's helpful for dia_out_dir to be defined and persistent.

We're honestly still at the point where we need to do debugging and looking at pixels to check performance of subtraction and real-bogus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation installation testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Re-factor detect_supernova and adopt snappl

2 participants