The raw data (in ~16MB files), and pipeline products, produce too many small files. It would be useful to have some additional logic to e.g. tar raw dat files, and ideally to be able to find and read the necessary files from the tar files. We don't want to (won't be able to) store more than a few million files long-term.
Some estimates on number of files (speced for 256 beams = 1/4sky)
- Raw files: ~4000 / beam / day = 1000000 / day
- Candidates: ~2x number of pointings, ~80000 / day
- Logs: number of pointings, ~40000 / day
- Stacks: number of pointings, ~160000 perpetual
- Folding products, very roughly ~number of pointings / 10
I guess the raw files and pipeline products are two separate problems. The raw files are too cumbersome to keep in large quantities, but we may want to e.g. keep a few days / weeks in long-term storage, without taking tens of millions of files
The candidates, logs, folds, are small enough to store long-term, perhaps tarring every day
The stacks are borderline problematic, perhaps splitting based on ra or dec ranges would clean the structure up
The raw data (in ~16MB files), and pipeline products, produce too many small files. It would be useful to have some additional logic to e.g. tar raw dat files, and ideally to be able to find and read the necessary files from the tar files. We don't want to (won't be able to) store more than a few million files long-term.
Some estimates on number of files (speced for 256 beams = 1/4sky)
I guess the raw files and pipeline products are two separate problems. The raw files are too cumbersome to keep in large quantities, but we may want to e.g. keep a few days / weeks in long-term storage, without taking tens of millions of files
The candidates, logs, folds, are small enough to store long-term, perhaps tarring every day
The stacks are borderline problematic, perhaps splitting based on ra or dec ranges would clean the structure up