You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The thread of work including #12750 and #12959 is good but introduces a potential efficiency issue in backups. In the simplest case, suppose we ingest SSTs from one CF to another CF in the same DB. The SST files get new numbers and file number is a critical part of the key for de-duplicating SST files in backups. Thus, a backup of the DB post-ingestion cannot share the backed-up SST file with a backup with the same file in a different CF--because the file numbers are different. See ShareFilesNaming for more background.
At first glance, it might seem like the best solution is to add a new naming scheme based on just SST unique ids, but then there's the problem that we get the destination file name on restore from the shared file name (removing the parts added for uniqueness). If we don't have the file number, we don't know what file name to restore to. If we do have the file number in the file name, we can't maximize sharing.
I propose that we name the file in the backup shared directory based on the orig_file_number in the table properties (when != 0), and if the number in the DB doesn't match that, we add a field (perhaps just "num") to the backup manifest entry indicating what file number it should be restored to. This is a major schema change, so would be backup schema_version=3, because ignoring the new field on restore would result in a corrupt DB. Importantly, this makes for a graceful upgrade path to schema_version=3, because we only change the shared file name of an SST file if it was ingested from a DB file. There is no hiccup where we are breaking incrementality of backups.
The text was updated successfully, but these errors were encountered:
The thread of work including #12750 and #12959 is good but introduces a potential efficiency issue in backups. In the simplest case, suppose we ingest SSTs from one CF to another CF in the same DB. The SST files get new numbers and file number is a critical part of the key for de-duplicating SST files in backups. Thus, a backup of the DB post-ingestion cannot share the backed-up SST file with a backup with the same file in a different CF--because the file numbers are different. See
ShareFilesNaming
for more background.At first glance, it might seem like the best solution is to add a new naming scheme based on just SST unique ids, but then there's the problem that we get the destination file name on restore from the shared file name (removing the parts added for uniqueness). If we don't have the file number, we don't know what file name to restore to. If we do have the file number in the file name, we can't maximize sharing.
I propose that we name the file in the backup shared directory based on the orig_file_number in the table properties (when != 0), and if the number in the DB doesn't match that, we add a field (perhaps just "num") to the backup manifest entry indicating what file number it should be restored to. This is a major schema change, so would be backup schema_version=3, because ignoring the new field on restore would result in a corrupt DB. Importantly, this makes for a graceful upgrade path to schema_version=3, because we only change the shared file name of an SST file if it was ingested from a DB file. There is no hiccup where we are breaking incrementality of backups.
The text was updated successfully, but these errors were encountered: