-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uneven detection threshold in HAP catalogs #1927
Comments
Comment by Rick White on JIRA: A more detailed discussion of the issue is on an innerspace page. Some information from that page is included here. A good example visit is Note that there are no cosmic rays (CRs) in this image because this is wfc3/ir. That makes this a good example because there is no question about additional CRs in the edge regions. (I do see this problem for other instruments too.) The other reason this is a good example is that the shifts between the exposures are quite large and are easily visible in the image and catalogs (see figures below). The 3 images below show the total image, the point catalog, and the segment catalog. The large shifts between the exposures produce the stair-steps at the corners. The point catalog distribution avoids the edges completely. Some gaps near the edges are expected since the catalog software flags sources that are close to the edge. But the edge region is clearly much wider than necessary. The segment catalog on the other hand is very sparse in the center (it finds many fewer sources than the point catalog in the same region), and instead finds many more sources on the edges. That is certainly incorrect. Michele De La Pena processed this visit with the current version of the HAP software and found that the resulting catalogs look pretty similar to these. More info will be posted when it is available. |
Comment by Rick White on JIRA: Here are images of the newly processed catalogs from Michele De La Pena using the latest release, CALDP-2024.10.2-4+enchilada. !hst_12939_3j_wfc3_ir_total_iby03j-pt-new.png|width=48%! !hst_12939_3j_wfc3_ir_total_iby03j-seg-new.png|width=48%! The number of sources has increased in the new catalog (due to overall changes in the detection threshold). But the patterns for both the point and segment catalogs are the same as the older software version. |
Comment by Rick White on JIRA: Michele De La Pena I think this should make big improvements in both the point and segment catalogs. As a bonus, I believe this will lead to much better segment catalogs for ACS/SBC images. I had noticed some issues with those, and think they trace back to one of these bugs. Let me know if you have any questions! |
Comment by Michele De La Pena on JIRA: Rick White I regenerated my results for iby03j (hst_12939_3j, wfc3/ir) to use as a comparison for the improvements coded in this ticket. I have modified the make_wht_masks and compute_threshold functions in my working copy as described here as an initial test. I am finding that make_wht_masks is taking a very long time to run (> 15 minutes). According to your innerspace documentation page, it seems you have run this routine on a number of datasets. I am curious if you have statistics on the execution time for the routine. |
Comment by Rick White on JIRA: Michele De La Pena Is your code some place where I can see it? Are you passing some different parameters to the I did make one improvement in my version of def make_wht_masks(
whtarr: np.ndarray, # image weight array (dtype=float32), zeros outside of footprint
maskarr: np.ndarray, # mask array (dtype=bool), typically inverse of footprint
scale: float = 1.5,
sensitivity: float = 0.95,
kernel: tuple = (11, 11) # kernel size for maximum filter window
) -> list: # list containing a dictionary
""" Uses scipy's maximum_filter to create the image weight masks of floats between 0 and 1.
Function produces a list including a dictionary with the scale (float), wht_limit (float),
mask (np.ndarray of bools, dtype=int16), and relative weight (np.ndarray, dtype=float32).
"""
# uses scipy maximum filter on image. Maximum filter selects the largest value within an ordered
# window of pixels values and replaces the central pixel with the largest value.
maxwht = ndimage.filters.maximum_filter(whtarr, size=kernel)
# normalized weight array
rel_wht = maxwht / maxwht.max()
# initialize values
# master_mask indicates pixels remaining to be included
# initially it includes all unmasked pixels
master_mask = maskarr == 0
limit = 1 / scale
masks = []
# minimum threshold on number of pixels in a segment
# this is weird, trying to preserve the peculiar definition of similarity
osum = master_mask.sum()
pthresh = (1 - sensitivity)*osum
# loop through scale values until all pixels are used
while master_mask.any():
# get pixels above the new limit that have not been used yet
mask = (rel_wht > limit) & master_mask
if mask.any():
msum = mask.sum()
# keep this segment if it is sufficiently big and if the remaining pixels are not too small
mmaster = master_mask.sum()
if msum == mmaster or (msum > pthresh and mmaster-msum > pthresh):
masks.append(
dict(
scale=limit,
wht_limit=limit * maxwht.max(),
mask=mask.astype(np.uint16),
rel_weight=rel_wht * mask,
)
)
master_mask &= (~mask)
elif mmaster-msum <= pthresh:
# few pixels remaining, drop threshold to zero so they are found in the next iteration
limit = 0.0
limit /= scale
return masks |
Comment by Michele De La Pena on JIRA: Rick White I pulled your exact new routine and inserted into the drizzlepac code. I am not sure how long it would have run as I only let it go for a minute. I then made a standalone routine from your same new code, used the hst_12939_3j_wfc3_ir_total_iby03j_drz.fits image to extract the WHT and make an inverse mask. This routine took no time at all to execute. BTW, I only got one mask in the output mask array. Clearly, the inputs in drizzlepac are not quite the same as for the manual standalone execution, but now I have something to compare against. |
Comment by Rick White on JIRA: Michele De La Pena I'll bet that there is some inconsistency between the If possible, you could modify the code to save the values of those parameters? Then I could run the code with those exact parameters and look at what is happening. |
Comment by Michele De La Pena on JIRA: Rick White I am currently in the debugger. Once I am done I will save the Drizzlepac version of the whtarr and maskarr input images to FITS files. |
Comment by Michele De La Pena on JIRA: Rick White I have put the images of interest on the Linux systems (dlhlalab#) in /home/mdelapena/ForRick/WHT_MASKS. There is a README which just says:
|
Comment by Rick White on JIRA: Michele De La Pena Got it! The initialization of the # initially it includes all unmasked pixels
master_mask = (maskarr == 0) & (rel_wht > 0) Here is an updated version of the complete function, including all the changes: def make_wht_masks(
whtarr: np.ndarray, # image weight array (dtype=float32), zeros outside of footprint
maskarr: np.ndarray, # mask array (dtype=bool), typically inverse of footprint
scale: float = 1.5,
sensitivity: float = 0.95,
kernel: tuple = (11, 11) # kernel size for maximum filter window
) -> list: # list containing a dictionary
""" Uses scipy's maximum_filter to create the image weight masks of floats between 0 and 1.
Function produces a list including a dictionary with the scale (float), wht_limit (float),
mask (np.ndarray of bools, dtype=int16), and relative weight (np.ndarray, dtype=float32).
"""
# uses scipy maximum filter on image. Maximum filter selects the largest value within an ordered
# window of pixels values and replaces the central pixel with the largest value.
maxwht = ndimage.filters.maximum_filter(whtarr, size=kernel)
# normalized weight array
rel_wht = maxwht / maxwht.max()
# initialize values
# master_mask indicates pixels remaining to be included
# initially it includes all unmasked pixels
master_mask = (maskarr == 0) & (rel_wht > 0)
limit = 1 / scale
masks = []
# minimum threshold on number of pixels in a segment
# this is weird, trying to preserve the peculiar definition of similarity
osum = master_mask.sum()
pthresh = (1 - sensitivity)*osum
# loop through scale values until all pixels are used
while master_mask.any():
# get pixels above the new limit that have not been used yet
mask = (rel_wht > limit) & master_mask
if mask.any():
msum = mask.sum()
# keep this segment if it is sufficiently big and if the remaining pixels are not too small
mmaster = master_mask.sum()
if msum == mmaster or (msum > pthresh and mmaster-msum > pthresh):
masks.append(
dict(
scale=limit,
wht_limit=limit * maxwht.max(),
mask=mask.astype(np.uint16),
rel_weight=rel_wht * mask,
)
)
master_mask &= (~mask)
elif mmaster-msum <= pthresh:
# few pixels remaining, drop threshold to zero so they are found in the next iteration
limit = 0.0
limit /= scale
return masks |
Comment by Michele De La Pena on JIRA: Rick White I have implemented the Rick improved make_wht_masks()_ function in the _catalog_utils.py module with some light editing of variable names. I have also corrected the bug fix noted by Rick in the compute_threshold()_ method of the _HAPSegmentCatalog class which removes the use of a scale variable in the computation of the threshold_rms variable. The results of this implementation can be seen here for the segmentation catalog (left side are old results and right side as the results discussed in this ticket) !hst_12939_3j Segment Catalog OLD vs IMPROVED.png|thumbnail! and the point catalog (left side are old results and right side as the results discussed in this ticket). !hst)12939_3j Point Catalog OLD vs IMPROVED.png|thumbnail! It is observed that the number of source detections in the Segment catalog is much lower than that of the Point catalog in the central portion of the drizzled image even after these bug fixes. Further investigation/tuning is warranted. |
Comment by Rick White on JIRA: Michele De La Pena That looks promising! Probably we need to reduce the detection threshold parameter for all the segment catalogs as a result of this change. Are we using a threshold of 5-sigma now? We might trying lowering that to 4 or 3. I see a parameter We have made a bunch of adjustments in the noise and threshold calculations, so it is not too surprising that we'll need to adjust the configuration parameters as a result. |
Comment by Michele De La Pena on JIRA: Rick White There are two "nsigma" values in the catalog configuration file, one for custom/gaussian PSF and one for the RickerWavelet case. They are both currently set to "3". I already did the experiment and changed these values to "1" which is the default for the Point catalog. The number of sources for this one dataset tested did improve by a factor of 3, but it number is still far below that of the Point catalog. This is without filtering on the flags. I thought I would possibly wait until the "Photutils" and this ticket are all checked in to go back and address the sigma value for the detection threshold, particularly as this ticket is destined for the software build AFTER the "Photutils" build. |
Comment by Rick White on JIRA: Michele De La Pena I agree that getting the One question: Does the version of the code you are using have the changes from the I am currently looking into one additional change that would be more minor. It would change the configuration parameters that determine when the segment catalog switches from the Gaussian kernel to the RickerWavelet2D kernel. That will be a relatively minor change that will affect fewer fields though (and will be another ticket). Otherwise I don't know of any other changes that are needed. |
Comment by Michele De La Pena on JIRA: Rick White Yes, the software being tested for this ticket contains the fix done in HLA-1362 (minimum RMS). |
Comment by Michele De La Pena on JIRA: This ticket is PR#1939 which was tested and merged on 04 February 2025. |
Issue HLA-1407 was created on JIRA by Rick White:
When there are shifts between images that result in uneven exposure times across the image, the HAP segment and point catalogs both look incorrect.
The HAP segment catalogs appear to use a deeper detection threshold on the edges than in the center of the images, where there are more exposures. The opposite ought to be true: since the exposure time is greater in the center, there should be more sources detected there.
On the other hand, the HAP point catalogs have no sources at all in the edge regions. The point catalogs only include sources in the region where all the exposures overlap.
A test confirms that the problem still exists in the current version of the HAP pipeline.
There must be one or more bugs in the calculation of the noise in different image regions. And apparently the point catalog only finds sources in a restricted region.
The text was updated successfully, but these errors were encountered: