Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uneven detection threshold in HAP catalogs #1927

Open
stscijgbot-hstdp opened this issue Jan 6, 2025 · 16 comments
Open

Uneven detection threshold in HAP catalogs #1927

stscijgbot-hstdp opened this issue Jan 6, 2025 · 16 comments

Comments

@stscijgbot-hstdp
Copy link
Collaborator

Issue HLA-1407 was created on JIRA by Rick White:

When there are shifts between images that result in uneven exposure times across the image, the HAP segment and point catalogs both look incorrect.

The HAP segment catalogs appear to use a deeper detection threshold on the edges than in the center of the images, where there are more exposures. The opposite ought to be true: since the exposure time is greater in the center, there should be more sources detected there.

On the other hand, the HAP point catalogs have no sources at all in the edge regions. The point catalogs only include sources in the region where all the exposures overlap.

A test confirms that the problem still exists in the current version of the HAP pipeline.

There must be one or more bugs in the calculation of the noise in different image regions. And apparently the point catalog only finds sources in a restricted region.

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Rick White on JIRA:

A more detailed discussion of the issue is on an innerspace page. Some information from that page is included here.

A good example visit is hst_12939_3j_wfc3_ir_iby03j. This WFC3/IR visit has 2 exposures each in the f110w and f160w filters. There are significant offsets among the 4 exposures (presumably to help with removing bad pixels in the detector). Here is a link to the interactive display using the total image (which is the sum of the two filters).

Note that there are no cosmic rays (CRs) in this image because this is wfc3/ir. That makes this a good example because there is no question about additional CRs in the edge regions. (I do see this problem for other instruments too.)

The other reason this is a good example is that the shifts between the exposures are quite large and are easily visible in the image and catalogs (see figures below).

The 3 images below show the total image, the point catalog, and the segment catalog.
!hst_12939_3j_wfc3_ir_total_iby03j.png|width=31%! !hst_12939_3j_wfc3_ir_total_iby03j-pt.png|width=31%! !hst_12939_3j_wfc3_ir_total_iby03j-seg.png|width=31%!

The large shifts between the exposures produce the stair-steps at the corners. The point catalog distribution avoids the edges completely. Some gaps near the edges are expected since the catalog software flags sources that are close to the edge. But the edge region is clearly much wider than necessary.

The segment catalog on the other hand is very sparse in the center (it finds many fewer sources than the point catalog in the same region), and instead finds many more sources on the edges. That is certainly incorrect.

Michele De La Pena processed this visit with the current version of the HAP software and found that the resulting catalogs look pretty similar to these. More info will be posted when it is available.

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Rick White on JIRA:

Here are images of the newly processed catalogs from Michele De La Pena using the latest release, CALDP-2024.10.2-4+enchilada.

!hst_12939_3j_wfc3_ir_total_iby03j-pt-new.png|width=48%! !hst_12939_3j_wfc3_ir_total_iby03j-seg-new.png|width=48%!

The number of sources has increased in the new catalog (due to overall changes in the detection threshold). But the patterns for both the point and segment catalogs are the same as the older software version.

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Rick White on JIRA:

Michele De La Pena
I just made updates to my innerspace page. I have figured out what causes this problem and include code to fix them. There are actually two bugs.

I think this should make big improvements in both the point and segment catalogs. As a bonus, I believe this will lead to much better segment catalogs for ACS/SBC images. I had noticed some issues with those, and think they trace back to one of these bugs.

Let me know if you have any questions!

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Michele De La Pena on JIRA:

Rick White I regenerated my results for iby03j (hst_12939_3j, wfc3/ir) to use as a comparison for the improvements coded in this ticket. I have modified the make_wht_masks and compute_threshold functions in my working copy as described here as an initial test. I am finding that make_wht_masks is taking a very long time to run (> 15 minutes). According to your innerspace documentation page, it seems you have run this routine on a number of datasets. I am curious if you have statistics on the execution time for the routine.

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Rick White on JIRA:

Michele De La Pena
Hmm, that seems wrong. I ran a test on image hst_12939_3j_wfc3_ir_total_iby03j_drz.fits just now and it took 0.2 secs to generate the list of masks.

Is your code some place where I can see it? Are you passing some different parameters to the make_wht_masks function?

I did make one improvement in my version of make_wht_masks that speeds things up a little. My current version is below. When the number of remaining pixels outside the mask is small, the elif block at the bottom of the loop immediately reduces the limit threshold to zero to avoid some unnecessary iterations. But that did not have a noticeable effect on the execution speed.

def make_wht_masks(
    whtarr: np.ndarray, # image weight array (dtype=float32), zeros outside of footprint
    maskarr: np.ndarray, # mask array (dtype=bool), typically inverse of footprint 
    scale: float = 1.5,
    sensitivity: float = 0.95,
    kernel: tuple = (11, 11) # kernel size for maximum filter window
) -> list: # list containing a dictionary 

    """ Uses scipy's maximum_filter to create the image weight masks of floats between 0 and 1. 
    Function produces a list including a dictionary with the scale (float), wht_limit (float), 
    mask (np.ndarray of bools, dtype=int16), and  relative weight (np.ndarray, dtype=float32). 
    """

    # uses scipy maximum filter on image. Maximum filter selects the largest value within an ordered 
    # window of pixels values and replaces the central pixel with the largest value.
    maxwht = ndimage.filters.maximum_filter(whtarr, size=kernel)
    
    # normalized weight array
    rel_wht = maxwht / maxwht.max()

    # initialize values
    # master_mask indicates pixels remaining to be included
    # initially it includes all unmasked pixels
    master_mask = maskarr == 0
    limit = 1 / scale
    masks = []
    # minimum threshold on number of pixels in a segment
    # this is weird, trying to preserve the peculiar definition of similarity
    osum = master_mask.sum()
    pthresh = (1 - sensitivity)*osum
    
    # loop through scale values until all pixels are used
    while master_mask.any():
        # get pixels above the new limit that have not been used yet
        mask = (rel_wht > limit) & master_mask
        if mask.any():
            msum = mask.sum()
            # keep this segment if it is sufficiently big and if the remaining pixels are not too small
            mmaster = master_mask.sum()
            if msum == mmaster or (msum > pthresh and mmaster-msum > pthresh):
                masks.append(
                    dict(
                        scale=limit,
                        wht_limit=limit * maxwht.max(),
                        mask=mask.astype(np.uint16),
                        rel_weight=rel_wht * mask,
                    )
                )
                master_mask &= (~mask)
            elif mmaster-msum <= pthresh:
                # few pixels remaining, drop threshold to zero so they are found in the next iteration
                limit = 0.0
        limit /= scale

    return masks

@stscijgbot-hstdp
Copy link
Collaborator Author

stscijgbot-hstdp commented Jan 23, 2025

Comment by Michele De La Pena on JIRA:

Rick White I pulled your exact new routine and inserted into the drizzlepac code. I am not sure how long it would have run as I only let it go for a minute. I then made a standalone routine from your same new code, used the hst_12939_3j_wfc3_ir_total_iby03j_drz.fits image to extract the WHT and make an inverse mask. This routine took no time at all to execute. BTW, I only got one mask in the output mask array. Clearly, the inputs in drizzlepac are not quite the same as for the manual standalone execution, but now I have something to compare against.

@stscijgbot-hstdp
Copy link
Collaborator Author

stscijgbot-hstdp commented Jan 23, 2025

Comment by Rick White on JIRA:

Michele De La Pena I'll bet that there is some inconsistency between the whtarr and maskarr arguments that is breaking the code. There is probably some place in my code where I am assuming some similarity between them that is not true.

If possible, you could modify the code to save the values of those parameters? Then I could run the code with those exact parameters and look at what is happening.

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Michele De La Pena on JIRA:

Rick White I am currently in the debugger. Once I am done I will save the Drizzlepac version of the whtarr and maskarr input images to FITS files.

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Michele De La Pena on JIRA:

Rick White I have put the images of interest on the Linux systems (dlhlalab#) in /home/mdelapena/ForRick/WHT_MASKS. There is a README which just says:

These are the images used in Drizzlepac as the input to make_wht_masks
where the make_wht_masks is your routine embedded in Drizzlepac.  When using
these images as the input in the standalone routine, the make_wht_masks routine did NOT return in a timely manner.
driz_maskarr.fits
driz_whtarr.fits

Just FYI
These are the images I created from the hst_12939_3j_wfc3_ir_total_iby03j_drz.fits
file to run your make_wht_masks routine in a standalone manner.  The inverse mask
image between the drizzlepace file and the standalone creation are not quite the
same along the edges.
sa_invfm.fits
sa_wht_image.fits

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Rick White on JIRA:

Michele De La Pena Got it! The initialization of the master_mask array (which indicates the pixels remaining to be included in masks) needs to exclude pixels that have wht==0. That was not true for these parameters (there some pixels around the edge that are zero). There is only one line that needs to be changed:

    # initially it includes all unmasked pixels
    master_mask = (maskarr == 0) & (rel_wht > 0)

Here is an updated version of the complete function, including all the changes:

def make_wht_masks(
    whtarr: np.ndarray, # image weight array (dtype=float32), zeros outside of footprint
    maskarr: np.ndarray, # mask array (dtype=bool), typically inverse of footprint 
    scale: float = 1.5,
    sensitivity: float = 0.95,
    kernel: tuple = (11, 11) # kernel size for maximum filter window
) -> list: # list containing a dictionary 
    """ Uses scipy's maximum_filter to create the image weight masks of floats between 0 and 1. 
    Function produces a list including a dictionary with the scale (float), wht_limit (float), 
    mask (np.ndarray of bools, dtype=int16), and  relative weight (np.ndarray, dtype=float32). 
    
    """

    # uses scipy maximum filter on image. Maximum filter selects the largest value within an ordered 
    # window of pixels values and replaces the central pixel with the largest value.
    maxwht = ndimage.filters.maximum_filter(whtarr, size=kernel)
    
    # normalized weight array
    rel_wht = maxwht / maxwht.max()

    # initialize values
    # master_mask indicates pixels remaining to be included
    # initially it includes all unmasked pixels
    master_mask = (maskarr == 0) & (rel_wht > 0)
    limit = 1 / scale
    masks = []
    # minimum threshold on number of pixels in a segment
    # this is weird, trying to preserve the peculiar definition of similarity
    osum = master_mask.sum()
    pthresh = (1 - sensitivity)*osum
    
    # loop through scale values until all pixels are used
    while master_mask.any():
        # get pixels above the new limit that have not been used yet
        mask = (rel_wht > limit) & master_mask
        if mask.any():
            msum = mask.sum()
            # keep this segment if it is sufficiently big and if the remaining pixels are not too small
            mmaster = master_mask.sum()
            if msum == mmaster or (msum > pthresh and mmaster-msum > pthresh):
                masks.append(
                    dict(
                        scale=limit,
                        wht_limit=limit * maxwht.max(),
                        mask=mask.astype(np.uint16),
                        rel_weight=rel_wht * mask,
                    )
                )
                master_mask &= (~mask)
            elif mmaster-msum <= pthresh:
                # few pixels remaining, drop threshold to zero so they are found in the next iteration
                limit = 0.0
        limit /= scale

    return masks

@stscijgbot-hstdp
Copy link
Collaborator Author

stscijgbot-hstdp commented Jan 24, 2025

Comment by Michele De La Pena on JIRA:

Rick White I have implemented the Rick improved make_wht_masks()_ function in the _catalog_utils.py module with some light editing of variable names. I have also corrected the bug fix noted by Rick in the compute_threshold()_ method of the _HAPSegmentCatalog class which removes the use of a scale variable in the computation of the threshold_rms variable.  The results of this implementation can be seen here for the segmentation catalog (left side are old results and right side as the results discussed in this ticket)

!hst_12939_3j Segment Catalog OLD vs IMPROVED.png|thumbnail!

and the point catalog (left side are old results and right side as the results discussed in this ticket).

!hst)12939_3j Point Catalog OLD vs IMPROVED.png|thumbnail!

It is observed that the number of source detections in the Segment catalog is much lower than that of the Point catalog in the central portion of the drizzled image even after these bug fixes. Further investigation/tuning is warranted.

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Rick White on JIRA:

Michele De La Pena That looks promising! Probably we need to reduce the detection threshold parameter for all the segment catalogs as a result of this change. Are we using a threshold of 5-sigma now? We might trying lowering that to 4 or 3. I see a parameter nsigma in the json configuration file, but I'm not sure whether that's the right parameter.

We have made a bunch of adjustments in the noise and threshold calculations, so it is not too surprising that we'll need to adjust the configuration parameters as a result.

@stscijgbot-hstdp
Copy link
Collaborator Author

stscijgbot-hstdp commented Jan 24, 2025

Comment by Michele De La Pena on JIRA:

Rick White There are two "nsigma" values in the catalog configuration file, one for custom/gaussian PSF and one for the RickerWavelet case.  They are both currently set to "3".  I already did the experiment and changed these values to "1" which is the default for the Point catalog.  The number of sources for this one dataset tested did improve by a factor of 3, but it number is still far below that of the Point catalog.  This is without filtering on the flags.

I thought I would possibly wait until the "Photutils" and this ticket are all checked in to go back and address the sigma value for the detection threshold, particularly as this ticket is destined for the software build AFTER the "Photutils" build.

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Rick White on JIRA:

Michele De La Pena I agree that getting the photutils changes working first sounds like a good plan. Before we start fiddling with detection parameters, we should have essentially all of the changes in the code.

One question: Does the version of the code you are using have the changes from the minimum_rms code (HLA-1362) already included? I assume it does. Before that fix, the minimum rms was typically too large, which would raise the threshold. But I think that would affect both the point and segment catalogs, so it would not be likely to cause the segment catalog to be too shallow. (I'm not 100% certain of that though.)

I am currently looking into one additional change that would be more minor. It would change the configuration parameters that determine when the segment catalog switches from the Gaussian kernel to the RickerWavelet2D kernel. That will be a relatively minor change that will affect fewer fields though (and will be another ticket). Otherwise I don't know of any other changes that are needed.

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Michele De La Pena on JIRA:

Rick White Yes, the software being tested for this ticket contains the fix done in HLA-1362 (minimum RMS).

@stscijgbot-hstdp
Copy link
Collaborator Author

Comment by Michele De La Pena on JIRA:

This ticket is PR#⁠1939 which was tested and merged on 04 February 2025.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant