Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove old data #10

Open
jfilter opened this issue Jul 7, 2018 · 1 comment
Open

remove old data #10

jfilter opened this issue Jul 7, 2018 · 1 comment

Comments

@jfilter
Copy link
Contributor

jfilter commented Jul 7, 2018

The size of my data base is growing quite high already. Since I expect to run the bot for decades, I would like to have a way to clean old articles (maybe only those without a match).

Options:

  1. add an CLI argument to only clean the DB
  2. add an CLI argument to run the Bot but also clean the DB
  3. add an CLI argument to run the bot but also with a probability p (e.g. 0.01) clean the DB (my favorite)

To find old articles, you have to check for the last date in the feed and then remove everything older than that.

@thisisparker
Copy link
Contributor

Oooh, that's fun. You're matching against a much larger list of sources than I am, so I'd put this off for a bit because my database isn't growing so fast. Let me think on these approaches. My initial hunch is that you could probably just dump a set number of old (and maybe unmatched) articles once the database grows past a certain size, but I'd want that to be at least sort of transparent, and maybe configurable.

(My colleague has suggested a use case where we produce some kind of metrics out of the database, so I want to be able to retain stuff if necessary.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants