Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add management command for updating Data Dog metric #35473

Conversation

Charl1996
Copy link
Contributor

@Charl1996 Charl1996 commented Dec 4, 2024

Related to this PR.

Technical Summary

This PR adds a management command for updating a counter metric on Data Dog. The idea is that this command will only run once. As such, it would probably be run as part of a limited release (I only thought of running it as part of a limited release after I've completed most of this PR).

What does the command do?

The command counts for how many applications the app dependencies feature was enabled and disabled again at some point.

The command will iterate through all application builds on all domains and check if a certain feature was disabled at any point after it was enabled for a particular application. If it was, a datadog counter will be increased by 1 and the particular application builds iteration will move on to the next one.

Knowing this information will give us a sense of feature dropout.

How is the command implemented?

The management command is built on top of the AppMigrationCommandBase. Well, technically only certain parts of the AppMigrationCommandBase, which was pulled out into a super class (see this comment; maybe because I'm planning to run it as a limited release this wasn't strictly necessary, I could simply have copied and modified the class). The super class manages the processed domains and the progress of the command by persisting it to the disk (and cleans up the temp files afterwards).

Please note: this command will probably put some load on the system, so I'll be testing it first on staging to see how it performs before moving forward.

Feature Flag

No specific feature flag

Safety Assurance

Safety story

To test on staging. Tested locally

Automated test coverage

No automated testing

QA Plan

No QA required

Rollback instructions

  • This PR can be reverted after deploy with no further considerations

Labels & Review

  • Risk label is set correctly
  • The set of people pinged as reviewers is appropriate for the level of risk of the change

@dimagimon dimagimon added the reindex/migration Reindex or migration will be required during or before deploy label Dec 4, 2024
@Charl1996 Charl1996 added the product/invisible Change has no end-user visible impact label Dec 4, 2024
@@ -24,6 +24,7 @@ function(doc){
vellum_case_management: !!doc.vellum_case_management,
target_commcare_flavor: doc.target_commcare_flavor,
family_id: doc.family_id,
profile: doc.profile,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is changed for the limited release, I presume this will actually make the change to the couch view? I need to quickly find out what will happen when the release expires.

@Charl1996 Charl1996 added the Open for review: do not merge A work in progress label Dec 5, 2024
@Charl1996 Charl1996 force-pushed the cs/SC-4053-datadog-metric-for-feature-historic-usage-management-cmd branch from b17ae91 to ff6f63e Compare December 5, 2024 06:14
@Charl1996 Charl1996 force-pushed the cs/SC-4053-datadog-metric-for-feature-historic-usage-management-cmd branch from ff6f63e to 8d44508 Compare December 5, 2024 06:15
@dimagimon dimagimon removed the reindex/migration Reindex or migration will be required during or before deploy label Dec 5, 2024
@kaapstorm
Copy link
Contributor

kaapstorm commented Dec 5, 2024

I haven't gone through this in detail, but I think there is a way to speed this up significantly, considering that we are looking for the dropoff count for a feature flag:

Instead of iterating all domains, we can just iterate the revisions of that one toggle. A toggle is a Couch document, and CouchDB stores the revisions of a document. Fetch all the revisions of the document, loop through the revisions, and get the domains from those.

You can find out how to get revisions by looking at get_deleted_doc() here. Maybe something like ...

def iter_revisions(toggle):
    db = Toggle.get_db()
    result = db.get(toggle._id, revs=True)
    for rev_id in result['_revisions']['ids']:
        res = db.get(toggle._id, rev=rev_id)
        yield Toggle.wrap(res)

@Charl1996
Copy link
Contributor Author

@kaapstorm

I haven't gone through this in detail, but I think there is a way to speed this up significantly, considering that we are looking for the dropoff count for a feature flag:

The FF has been deprecated 2 months back and the feature is now generally available. I see two options here:

  1. To get the most accurate number we need to iterate over all domains (actually, non-test domains <<-- need to add this)
  2. Assume the uptake wasn't significant in the last 2 months and just iterate over the domains that had the FF enabled.

Any thoughts?

@Charl1996
Copy link
Contributor Author

The FF has been deprecated 2 months back and the feature is now generally available. I see two options here:

Confirmed that it's OK to only consider the domains that had the feature flag enabled. Will update

@Charl1996
Copy link
Contributor Author

If folks don't have any concerns I'm happy to start testing this on staging.


def run(self, domains, domain_list_position):
for domain in domains:
app_ids = self.get_app_ids(domain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would get delete app ids as well? do we need them? or those are ignored later based on properties set in the class like include_deleted_apps?

Confirming that this here will just get the app but not the builds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or those are ignored later based on properties set in the class like include_deleted_apps

Correct.

Confirming that this here will just get the app but not the builds?

Yes, the builds are also being excluded with the include_builds = False flag as implemented in the get_all_app_ids method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I look forward to your test results from staging.

Copy link
Contributor

@mkangia mkangia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to test this on staging.

We won't be merging this PR right?

Copy link
Contributor

@gherceg gherceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pair reviewed with @jingcheng16 and we just have a question about the use of datadog for this.


for build_doc in app_builds[1:]:
if self._has_app_dependencies(build_doc['value']):
metrics_counter('commcare.app_build.dependencies_removed')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is only going to be run once, and the output is basically 2 numbers (dependencies_added and dependencies_removed), is it easier to just output this in the command? It seems unnecessary to send this one time metric to datadog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gherceg
this is meant to bump up the counter to current usage,
the feature uptake and dropout will continue to be tracked further with this other PR.

fyi @jingcheng16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense! Thank you for explanation!

@Charl1996
Copy link
Contributor Author

We won't be merging this PR right?

No, I'd rather do it through limited release, which I think is now much safer now.

@Charl1996
Copy link
Contributor Author

Update:
I've checked out of curiosity how many domains this command will run over, and it turns out none now that we only iterate previously feature flagged domains that are not test domains.

I've checked with the necessary people and have concluded that we can close this PR as this step have effectively become redundant.

@Charl1996 Charl1996 closed this Dec 11, 2024
@mkangia
Copy link
Contributor

mkangia commented Dec 13, 2024

Nice, good move @Charl1996

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Open for review: do not merge A work in progress product/invisible Change has no end-user visible impact
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants