Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more columns for uppervalid alerts #785

Merged
merged 7 commits into from
Jan 30, 2024
Merged

Conversation

JulienPeloton
Copy link
Member

IMPORTANT: Please create an issue first before opening a Pull Request.
Linked to issue(s): Closes #778

What changes were proposed in this pull request?

In this PR, we add necessary columns to reconstruct apparent magnitude for uppervalid alerts.

We also optimise the way we push data to HBase by reducing the number of elements to push (require fink-utils>=0.13.11). Basically, we were pushing before all the history, no matter the history was unchanged from the point of view of uppervalid or upper. This was leading to redundant write night after night, and it was putting a lot of load in the database when writing million of elements unnecessarily.

Now the strategy is different. We first look in the history if the last element is uppervalid or upper, and only in this case, we push the history in the database. We could further optimise by looking at the corresponding history, and selecting only latest elements (by opposition to all elements), but this would complexify too much the analysis for a very moderate gain.

Here are some numbers for one night (2024/01/04, 237,104 alerts):

uppervalid

old strategy new strategy
# alert history to explode 237,104 7,980
# rows to push to HBase 1,196,247 20,965

upper

old strategy new strategy
# alert history to explode 237,104 165,810
# rows to push to HBase 2,804,673 2,168,893

Conclusion

The number of elements to push for uppervalid is drastically smaller! This is why, we can afford more columns now. For upper, this number does not decrease much, because (1) the stream contains a lot of new objects each night (history full of upper limits), and (2) there are a lot of upper limits in between two valid measurements (this is probably where a more complex analysis would reduce the number of write -- but that's for later, maybe).

How was this patch tested?

Cloud & CI

@karpov-sv
Copy link
Collaborator

It seems you forgot to add distnr to the fields to keep for bad quality measurements - it is also important for reconstructing the lightcurve (to decide whether to apply magnr or not).

Also - do we need magpsf and sigmapsf for pure upper limits? I don't remember whether they actually contain anything or not - but they should be empty if I do not miss anything obvious

@JulienPeloton
Copy link
Member Author

It seems you forgot to add distnr to the fields to keep for bad quality measurements - it is also important for reconstructing the lightcurve (to decide whether to apply magnr or not).

right, thanks! Added.

Also - do we need magpsf and sigmapsf for pure upper limits? I don't remember whether they actually contain anything or not - but they should be empty if I do not miss anything obvious

No we don't need as they are NaN -- but I do not include them:

# line 240
df_index = df_index.drop(*['magpsf', 'sigmapsf'])

Copy link

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

1 New issue
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@JulienPeloton JulienPeloton merged commit 5392714 into master Jan 30, 2024
19 of 22 checks passed
@JulienPeloton JulienPeloton added this to the 3.2 milestone Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep more fields for uppervalid alerts
2 participants