Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive old posts to reduce disk usage #5016

Open
5 tasks done
Nutomic opened this issue Sep 12, 2024 · 5 comments
Open
5 tasks done

Archive old posts to reduce disk usage #5016

Nutomic opened this issue Sep 12, 2024 · 5 comments
Labels
area: database enhancement New feature or request

Comments

@Nutomic
Copy link
Member

Nutomic commented Sep 12, 2024

Requirements

  • Is this a feature request? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a feature request? Do not put multiple feature requests in one issue.
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.
  • Do you agree to follow the rules in our Code of Conduct?

Is your proposal related to a problem?

Votes make up the largest part of Lemmy's database size. In case of lemmy.ml, the database is 40.4 GB, with 15 GB of those being votes. Of those votes, 63% are more than 6 months old, and this proportion will only go up with time.

Describe the solution you'd like.

Really there is no reason to keep all these old votes around, because old posts are ignored by ranking algoritms. So we could save about 9.5 GB or 24% of disk space on lemmy.ml by deleting votes for posts older than 6 months. Votes displayed to users will still be correct as they are stored separately in post_aggregates table. We only need to ensure that ranking algorithms never recalculate scores for archived posts. Additionally it makes sense to lock commenting and other actions on posts after the same interval.

Describe alternatives you've considered.

Keep the current behaviour, but it will lead to very large database sizes in a few years.

Additional context

No response

@poVoq
Copy link

poVoq commented Sep 12, 2024

I am actually more concerned about the amount of writes votes do to SSD storage. I really chews through NVMe drives, and consumer grade SSDs with a low TDW are gone in about a year of Lemmy usage.

@dessalines
Copy link
Member

If we do something like this, I'd like to add an archived or archive_votes boolean column to the comment and post tables, so that any updates we make to the aggregate tables can easily ignore those ones, and make sure not to update them.

We could update that archived column as part of a periodic / startup job.

@MrKaplan-lw
Copy link

Speaking from Lemmy.World perspective, we do not want to drop old votes.
If this is implemented, it should be configurable. Perhaps by having an option to specify the minimum age, which could be set to never.

@Die4Ever
Copy link

Die4Ever commented Sep 18, 2024

Wouldn't this affect people who sort by New Comments or Active? Locking voting would be a shame but acceptable understanding the disk space costs of storing every individual vote.

Disabling comments would be bad though I think. It doesn't really save disk space, does it? I always thought being able to comment on and continue with old posts was a big strength of Lemmy over Reddit. Because Lemmy has the New Comments and Active sort methods it means you can actually have long term discussions like old forums could do. And it's really helpful with stickied/pinned posts too.

@dessalines
Copy link
Member

dessalines commented Sep 18, 2024

Its possible to clear out old votes in a way that doesn't lock the old content. We'd just need to make sure that it never recalculates the scores from scratch for those items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: database enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants