Skip to content

Grouping of many small files #169

@ramain

Description

@ramain

The raw data (in ~16MB files), and pipeline products, produce too many small files. It would be useful to have some additional logic to e.g. tar raw dat files, and ideally to be able to find and read the necessary files from the tar files. We don't want to (won't be able to) store more than a few million files long-term.

Some estimates on number of files (speced for 256 beams = 1/4sky)

  • Raw files: ~4000 / beam / day = 1000000 / day
  • Candidates: ~2x number of pointings, ~80000 / day
  • Logs: number of pointings, ~40000 / day
  • Stacks: number of pointings, ~160000 perpetual
  • Folding products, very roughly ~number of pointings / 10

I guess the raw files and pipeline products are two separate problems. The raw files are too cumbersome to keep in large quantities, but we may want to e.g. keep a few days / weeks in long-term storage, without taking tens of millions of files

The candidates, logs, folds, are small enough to store long-term, perhaps tarring every day

The stacks are borderline problematic, perhaps splitting based on ra or dec ranges would clean the structure up

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions