Potential duplication of shared backup files with ingestion of DB-generated SST files #12979

pdillinger · 2024-08-28T15:58:22Z

The thread of work including #12750 and #12959 is good but introduces a potential efficiency issue in backups. In the simplest case, suppose we ingest SSTs from one CF to another CF in the same DB. The SST files get new numbers and file number is a critical part of the key for de-duplicating SST files in backups. Thus, a backup of the DB post-ingestion cannot share the backed-up SST file with a backup with the same file in a different CF--because the file numbers are different. See ShareFilesNaming for more background.

At first glance, it might seem like the best solution is to add a new naming scheme based on just SST unique ids, but then there's the problem that we get the destination file name on restore from the shared file name (removing the parts added for uniqueness). If we don't have the file number, we don't know what file name to restore to. If we do have the file number in the file name, we can't maximize sharing.

I propose that we name the file in the backup shared directory based on the orig_file_number in the table properties (when != 0), and if the number in the DB doesn't match that, we add a field (perhaps just "num") to the backup manifest entry indicating what file number it should be restored to. This is a major schema change, so would be backup schema_version=3, because ignoring the new field on restore would result in a corrupt DB. Importantly, this makes for a graceful upgrade path to schema_version=3, because we only change the shared file name of an SST file if it was ingested from a DB file. There is no hiccup where we are breaking incrementality of backups.

The text was updated successfully, but these errors were encountered:

pdillinger added the performance Issues related to performance that may or may not be bugs label Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential duplication of shared backup files with ingestion of DB-generated SST files #12979

Potential duplication of shared backup files with ingestion of DB-generated SST files #12979

pdillinger commented Aug 28, 2024

Potential duplication of shared backup files with ingestion of DB-generated SST files #12979

Potential duplication of shared backup files with ingestion of DB-generated SST files #12979

Comments

pdillinger commented Aug 28, 2024