Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pupa clean CLI command #344

Merged
merged 9 commits into from
Mar 16, 2023
Merged

Add pupa clean CLI command #344

merged 9 commits into from
Mar 16, 2023

Conversation

antidipyramid
Copy link
Collaborator

@antidipyramid antidipyramid commented Feb 20, 2023

Overview

This PR adds a new pupa CLI command: pupa clean:

usage: pupa clean [-h] [--window WINDOW] [--report] [--noinput]

Removes database objects that haven't been seen in recent scrapes

optional arguments:
-h, --help show this help message and exit
--window WINDOW objects not seen in this many days will be deleted from the database
--report only generate a report of what objects this command would delete without making any changes to the database
--noinput delete objects without getting user confirmation

Testing Instructions

  • Make sure tests pass

To test with a live database, you should use a local instance of opencivicdata/scrapers-us-municipal:

  • Make sure your database has been populated
  • Rebuild your scrapers containers with git by adding sudo apt-get install git to the Dockerfile
  • Install this branch of pupa locally by copying this directory into the root directory with cp -r /path/to/pupa . and running docker-compose run --rm scrapers pip install -e pupa
  • Make sure the new command works with docker-compose run --rm app pupa clean

@coveralls
Copy link

coveralls commented Feb 20, 2023

Coverage Status

Coverage: 94.806%. Remained the same when pulling 9f51bbf on clean-cli into a9c9f80 on master.

Copy link
Contributor

@hancush hancush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, @antidipyramid! Excited to bring this over the line. Left a few things inline.

Comment on lines 32 to 39
results = []
for model in models:
# Jurisdictions are protected from deletion
if "Jurisdiction" not in model.__name__:
cutoff_date = datetime.now(tz=timezone.utc) - timedelta(days=window)
results.append(model.objects.filter(last_seen__lte=cutoff_date))

return itertools.chain(*results)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making this a generator (yielding from results) instead of building up a list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to go generator when I know I just need to iterate over the results and a list could grow to be quite large, hogging a lot of memory.

pupa/cli/commands/clean.py Outdated Show resolved Hide resolved
Comment on lines 42 to 48
def remove_stale_objects(window):
"""
Remove all database objects that haven't seen been in {window} days.
"""
for obj in get_stale_objects(window):
print(f"Deleting {obj}...")
obj.delete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making this and reporting functionality helper methods on your command class, since they're directly related to command behavior?

)
resp = input()
if resp != "Y":
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to use sys.exit() so it's explicit what's going on. You can pass a 0 exit code if you want a No to still be considered a successful run.

Comment on lines 90 to 94
print(
"This will permanently delete all objects from your database"
f" that have not been scraped within the last {args.window}"
" days. Are you sure? (Y/N)"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to report the specific objects (or a summary?) in this prompt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have a report option, would a count of objects to be deleted be enough?

return result


def get_stale_objects(window):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding this to the command class, as well?

@@ -0,0 +1,51 @@
import pytest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on adding an integration test that sets up the data and runs the pupa clean command, testing for the desired outcomes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea. Do you mean just calling Command.handle() and testing the output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually test that the command removes the data I expect and doesn't touch anything else.

@antidipyramid antidipyramid requested a review from hancush March 14, 2023 20:39
Copy link
Contributor

@hancush hancush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small thing inline, then this can come in. This is really cool!

pupa/cli/commands/clean.py Outdated Show resolved Hide resolved
Fix clean warning prompt

Co-authored-by: hannah cushman garland <[email protected]>
@antidipyramid antidipyramid merged commit 212a218 into master Mar 16, 2023
@hancush
Copy link
Contributor

hancush commented Mar 16, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants