-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lambda is biased low when trying to reproduce the DES Y1 redmapper run #78
Comments
Hi Jacob, Just to add a bit more information. I was trying to do the same thing using SDSS DR8 data and experienced exactly the same thing when comparing with the official catalog. After carefully comparing mine result with the official catalog, I think it is due to current algorithm producing a narrower calibrated red sequence. I found that low lambda is connected with generally low I also compare the catalog produced by current redmapper with some spectroscopic data and found the current version can generally reduce outlier fraction. Hope this helps. Best, |
Hi @Hoptune, Thanks for your comment. I think that your experience with the SDSS catalogue might be slightly different though as when I look at the difference in the probability of membership for the members of the matched clusters they look similar, however, it appears that there are simply less members per cluster (which is consistent with the magnitude distributions above): Iter2 Iter3 Cheers, |
@jacobic Thank you for this comprehensive comparison report. A lot more than usually shows up in a github issue! Anyway, @Hoptune is right that there are systematic differences here between the code that was run on DES Y1 (and SDSS DR8) and what is in this repo. All the fully published redmapper catalogs, actually, are from the old non-public IDL code, which is finicky to use (aside from being based on a proprietary non-free language, etc.) There are a number of differences and (I believe) improvements from the old IDL code to the new Python code in this repo. But the one change that is most relevant to the systematic richness differences is in some assumptions about the red-sequence model. In the IDL code (and earlier versions of the Python code) the correlation coefficient in (intrinsic) residuals between different colors in the red-sequence model was a free parameter. It turns out this was not very stable. Furthermore, tests on sims (and even looking at spec data) shows that a galaxy that is intrinsically redder than the red sequence in one color is also going to be redder in other colors. That is, if the There are other changes to the code as improvements have been made. So the IDL results are not going to be completely replicatable with the Python code, unfortunately. I hope this helps! |
Hi Eli,
I am trying to reproduce the DES Y1 redmapper catalogue as a way of validating the performance of my pipeline:
http://desdr-server.ncsa.illinois.edu/despublic/y1a1_files/redmapper/redmapper_y1a1_public_v6.4_catalog.fits.gz
http://desdr-server.ncsa.illinois.edu/despublic/y1a1_files/redmapper/redmapper_y1a1_public_v6.4_members.fits.gz
However, I am finding that the richness of the optically selected clusters that I generate are consistently low relative to the public optically selected catalogues. On average lambda is about 20% smaller than the public values, although the redshifts have a good agreement. Looking at the magnitude distribution of the members of the matched clusters, it suggests that a significantly lower fraction of member galaxies are selected at all magnitudes in all bands. I have repeated the comparisons after each iteration of the calibration (1, 2 and 3) as well as for scanning mode runs on the optical centres of the public catalogues to see if that would makes a difference.
This could be for a number of reasons e.g.
I have tried to eliminate as many of sources of discrepancy as possible however I am still finding a small differences in the richness value, which to me suggests that either there is a small (but fundamental) misunderstanding from my side, or that my pipeline likely has some room for improvement. I have created this issue in hope that you can point me in the right direction.
Photometry
Below is the details of the calibration and a summary of the tests that I have carried out:
I am using the DES Y1 MOF catalogue
http://desdr-server.ncsa.illinois.edu/despublic/y1a1_files/mof_catalogs/y1a1-gold-mof-badregion.fits
As well as the latest version of redmapper (which passes all unit tests)
Here is a summary of the way I pixelise the photometry and map the data to the redmapper dtype. I also clean the catalogue before it is pixelised in a way which resembles that is described in the papers using 'and '.join(exprs_clean). I also remove any sources with nan mags or errs or very unphysical values for galaxies (mag > 27 or mag_err > 1). Perhaps I am over cleaning the data?
HealSparse Depth maps
I created depth maps at nside 4096 using your model (taken from the code with redmapper) and I expand to larger pixels when if the fit fails.
Comparing these depth maps with the public depth maps show a good agreement which with some scatter due to the fact that I have not recovered the depth maps using the recommended procedure in Rykoff+15, I have simply used the error model and expanded to larger pixels as necessary. Here the notation is 'rm' corresponds to the public version and 'em' corresponds to the reproduced version.
Do you think this scatter or the way the depth map is created could cause cause lambda to biased 20% lower than expected? Perhaps I should not attempt to use a map which is too high resolution since at nside=2048 the maps are much smoother and have almost enough galaxies per pixel
HealSparse Mask
The healsparse mask I create is from a combination of the footprint and badmask files and I just use fracgood = 1 or 0 so some information is lost
Calibration
With the pixelised galaxy catalogues in hand I then run the calibration over the entire footprint
bkg_deepmode: True
- I assumed this would make it as accurate as possiblespecfile_train: /u/jacobic/redpipes/data/interim/training/spiders_dr16_archive.fits
which corresponds to all the confirmed spectroscopic members available in the literature (e.g. for the members in the redmapper catalogue if p > 0.9 and they have a spectroscopic redshift they are added to the training set). Could this cause overfitting and therefore low richness values?The full cal.yml file:
All other check plots that redmapper creates are contained in the zip file below for iterations 1-3
I did not spot anything unusual in the plots, the redshift bias and scatter + zred looks ok to me but perhaps are able to spot something fishy?
Tests
I did a separate optical cluster finding run based on iter 1 -3 and repeated each of the runs in scanning mode centred on the public optical catalogue as a catfile. I then crossmatched public and reproduced catalogues optical centres with a search radius of 0.05 deg.
The config file for each iteration is below:
run_iter1-3.yml.zip
Number of matched clusters with 0.05 deg
The number of matches between the reproduced and public catalogues differ as function of the number of iterations and are sensitive to the run mode. Here is a quick summary of the number of matches within a search radius of 0.05 deg.
Optical only <- plots below are based on these clusters
Iter 1
4188
Iter2
4364
Iter 3
4352
Scanning mode on public centres
Iter1
5601
Iter2
5630
Iter3
5631
Photometric redshift discrepancy
There is a photometric redshift discrepancy which seems to get less smooth with more iterations (please not these are separate plots so there is not a common normalisation)
iter1
![test_cat_iter1_z_lambda_bias](https://user-images.githubusercontent.com/9905226/121744217-de9aeb80-cb02-11eb-9b93-fed8f1483dcf.png)
![test_cat_iter2_z_lambda_bias](https://user-images.githubusercontent.com/9905226/121744211-de025500-cb02-11eb-8a2e-7b34f2d74347.png)
![test_cat_iter3_z_lambda_bias](https://user-images.githubusercontent.com/9905226/121744203-db9ffb00-cb02-11eb-9e2a-9c26832c8b24.png)
iter2
iter3
Richness discrepancy
The richness is more significantly different than the redshifts for the matched clusters
iter1
![test_cat_iter1_lambda_ratio](https://user-images.githubusercontent.com/9905226/121743776-2b31f700-cb02-11eb-851d-2ae5f157c6d6.png)
![test_cat_iter2_lambda_ratio](https://user-images.githubusercontent.com/9905226/121743775-2a996080-cb02-11eb-9b98-5ef200307cec.png)
![test_cat_iter3_lambda_ratio](https://user-images.githubusercontent.com/9905226/121743770-29683380-cb02-11eb-90bd-ad2358789ae8.png)
iter2
Iter3
This can be explained by the magnitude distribution of the members of the matched clusters where we see less members at all magnitudes
iter1
![test_iter1_mem_mag_auto](https://user-images.githubusercontent.com/9905226/121743578-d8583f80-cb01-11eb-80ea-b0c4665130f4.png)
![test_iter2_mem_mag_auto](https://user-images.githubusercontent.com/9905226/121743577-d7bfa900-cb01-11eb-9754-76b54c3913c5.png)
![test_iter3_mem_mag_auto](https://user-images.githubusercontent.com/9905226/121743575-d7271280-cb01-11eb-8542-0e727a9d50e9.png)
iter2
iter3
I thought this could perhaps be due to differences in the red-sequence, so I matched both public and reproduced member catalogues to the same specfile (all publicly available spectra) and the distribution in the redshift-colour plane is very similar (note that each figure a common normalisation between subfigures but not between the figures corresponding to other iterations). Left column is the reproduced catalogue and the right column is the public catalogue.
iter1
![test_iter1_mem_redsequence](https://user-images.githubusercontent.com/9905226/121743398-95966780-cb01-11eb-8c42-40684acf885e.png)
![test_iter2_mem_redsequence](https://user-images.githubusercontent.com/9905226/121743401-96c79480-cb01-11eb-9cce-ed2ea34f3559.png)
![test_iter3_mem_redsequence](https://user-images.githubusercontent.com/9905226/121743404-97f8c180-cb01-11eb-9627-ce4e29bbf909.png)
iter2
iter3
The zscan versions of all these plots look very similar to the optically selected ones shown above so I have not uploaded them, however I do get a lot more matches within 0.05 deg of the public clusters in zscan mode (presumably because they are already centred on probable cg's)
Summary
Please let me know if you have any advice about how to optimise this redmapper run on DES Y1. Of the following which do you think is the most likely to cause the discrepancy shown in the tests above?
I also compared DES Y1 /SPT clusters to Legacy Imaging DR8 redmapper clusters and they were biased low in richness by about the same about. I also did the same thing for DES-SVA reproduced vs DES SVA public redmapper catalogues and it low richness was also apparent. This makes me think it is not related to the treatment of the photometry or the masks since these things differ between surveys, however the depth map creation, configuration settings and training set are all very similar between the different runs i've tried so perhaps they are more likely to be a cause of the issue.
Once this catalogue is in close agreement with your public version I think the catalogues made via my pipeline will hopefully be more complete with lower-scatter in richness and more members per cluster which is essential for spectroscopic targeting.
Thanks for your amazing work on redmapper and sorry for such a long issue report
Have a nice weekend!
Jacob
The text was updated successfully, but these errors were encountered: