Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a reindexing daemon. #142

Open
1 of 4 tasks
msm-code opened this issue May 3, 2020 · 1 comment
Open
1 of 4 tasks

Create a reindexing daemon. #142

msm-code opened this issue May 3, 2020 · 1 comment
Assignees
Labels
needs more design Non-trivial design issues invoved. Ask maintainers before working on zone:backend Backend oriented tasks
Milestone

Comments

@msm-code
Copy link
Contributor

msm-code commented May 3, 2020

Feature Category

  • Correctness
  • User Interface / User Experience
  • Performance
  • Other (please explain)

Describe the problem

Right now the user have to create their own cron job, to keep database and samples folder in sync

Describe the solution you'd like

Create a new service, that will automatically watch /mnt/samples directory and reindex samples every night (for example).

This script should take over the functionality of utils/reindex_local.py script and reindex all samples first, and later it should compact datasets as long as it's possible. Useful code snippet:

#!/usr/bin/env python

import logging
from lib.ursadb import UrsaDb
import time


def main() -> None:
    logging.basicConfig(level=logging.INFO)
    ursa = UrsaDb('tcp://localhost:9281')
    stage = 0
    last_datasets = None
    while True:
        datasets = set(ursa.execute_command("topology;")["result"]["datasets"].keys())
        if last_datasets:
            removed = list(last_datasets - datasets)
            created = list(datasets - last_datasets)
            logging.info("%s => %s", removed, created)
        logging.info("Stage %s: %s datasets left.", stage, len(datasets))
        if last_datasets and datasets == last_datasets:
            logging.info("Finally, a fixed point! Returning...")
            return

        start = time.time()
        ursa.execute_command("compact all;")
        end = time.time()
        logging.info("Compacting took %s seconds...", (end-start))
        stage += 1
        last_datasets = datasets


if __name__ == "__main__":
    main()

Add a way to manually trigger it easily using docker-compose (probably something like docker-compose run reindex --now)

@msm-code msm-code added this to the v1.2.0 milestone May 3, 2020
@msm-code msm-code self-assigned this May 10, 2020
@msm-code
Copy link
Contributor Author

Not implemented currently. Reindexing was made friendly enough that it's not needed as much as it used to.

@msm-code msm-code added priority:low Priority: low and removed priority:medium labels May 15, 2020
@msm-code msm-code removed this from the v1.2.0 milestone May 15, 2020
@msm-code msm-code removed their assignment May 15, 2020
@msm-cert msm-cert added this to the v1.5.0 milestone Sep 29, 2024
@msm-cert msm-cert added the needs more design Non-trivial design issues invoved. Ask maintainers before working on label Sep 29, 2024
@msm-cert msm-cert self-assigned this Sep 29, 2024
@msm-cert msm-cert added the zone:backend Backend oriented tasks label Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more design Non-trivial design issues invoved. Ask maintainers before working on zone:backend Backend oriented tasks
Projects
None yet
Development

No branches or pull requests

2 participants