Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSO index table #774

Merged
merged 5 commits into from
Dec 21, 2023
Merged

SSO index table #774

merged 5 commits into from
Dec 21, 2023

Conversation

JulienPeloton
Copy link
Member

@JulienPeloton JulienPeloton commented Dec 18, 2023

IMPORTANT: Please create an issue first before opening a Pull Request.
Linked to issue(s): Closes #773

What changes were proposed in this pull request?

This PR constructs a new table in HBase, ztf.sso_resolver, with Solar System numbers & names. We start from all ssnamenr in Fink. We then query quaero to get corresponding name & number (if it exists) for each ssnamenr, and we construct an index containing the concatenation of (names, numbers, ssnamenr). This table has then 2 other columns, i:source that specify the provenance of the index (among name, number, and ssnamenr), and i:ssnamenr which gives the corresponding ssnamenr of the index. Below, I give example for typical use case.

MPC -> ZTF ssnamenr

A user provide a MPC number or name, and wants to know all existing counterparts:

client = connect_to_hbase_table('ztf.sso_resolver')
MPC_number = "624188"
result = client.scan(
    "",
    "key:key:{}_".format(MPC_number),
    "i:ssnamenr,i:source",
    0, False, False
)
pdf = pd.DataFrame.from_dict(result, orient='index')
print(pdf)
          i:source i:ssnamenr
624188_0    number   2002MA06
624188_1    number    2002MA6
624188_2    number     624188
624188_3  ssnamenr     624188

So one sees that the MPC number 624188 points to 3 ssnamenr. Note that the indices have _ which are used internally to deduplicate entries. They should be removed if one wants to use them (MPC does not include any _ in names).

MPC -> ZTF ssnamenr -> ZTF alerts

Building from the previous query, one has to just query the ztf.sso table then:

# get ssnamenr corresponding to a MPC number
client = connect_to_hbase_table('ztf.sso_resolver')
MPC_number = "624188"
result = client.scan(
    "",
    "key:key:{}_".format(MPC_number),
    "i:ssnamenr,i:source",
    0, False, False
)
client.close()

pdf = pd.DataFrame.from_dict(result, orient='index')
ssnamenrs = np.unique(pdf['i:ssnamenr'])

# get alerts
client = connect_to_hbase_table('ztf.ssnamenr')
results = {}
for ssnamenr in ssnamenrs:
    to_evaluate = "key:key:{}_".format(ssnamenr)
    result = client.scan(
        "",
        to_evaluate,
        "i:objectId",
        0, False, False
    )
    results.update(result)
pdf = pd.DataFrame.from_dict(results, orient='index')
print(pdf)

client.close()

                           i:objectId
2002MA06_2458798.7735995  ZTF19acrplsa
2002MA06_2458836.9314815  ZTF19adagirr
2002MA6_2460181.9716319   ZTF23aazubcu
2002MA6_2460196.8626968   ZTF23abcakzn
2002MA6_2460203.9041204   ZTF23abecthp
2002MA6_2460203.9782407   ZTF23abeeoen
2002MA6_2460207.8895833   ZTF23abfvtyr
2002MA6_2460207.9490278   ZTF23abfxptv
2002MA6_2460213.9683681   ZTF23abhahgo
624188_2460222.9076042    ZTF23abidfxt
624188_2460224.7790162    ZTF23abilvaz
624188_2460224.888669     ZTF23abiotst
...

ZTF ssnamenr -> MPC

client = connect_to_hbase_table('ztf.sso_resolver')
result = client.scan(
    "",
    "i:ssnamenr:624188",
    "i:source,i:ssnamenr",
    0, False, False
)
pdf = pd.DataFrame.from_dict(result, orient='index')
print(pdf)

            i:source i:ssnamenr
2002 MA6_2      name     624188
624188_2      number     624188
624188_3    ssnamenr     624188

This one is slower than others as we have to perform a full scan of a non-indexed column (about 4 seconds for a single ssnamenr).

What is next?

Update the SSO resolver in the fink-science-portal.

How was this patch tested?

Cloud.

Copy link

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@JulienPeloton JulienPeloton merged commit 655447f into master Dec 21, 2023
15 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build SSO index table
1 participant