fix: make storage migrations idempotent #805
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What kind of change does this PR introduce?
The
storage.migrationstable can sometimes fall out of sync, requiring migrations to be replayed on databases that already applied them. This PR makes all migrations fully idempotent, allowing you to clearstorage.migrationsand safely rerun the entire migration chain without errors, ensuring the final database state matches a clean first-run.Current behavior
drop function if existswithoutcascadecausing issues when it has dependencies like RLS policiesmigrations/tenant/0038-iceberg-catalog-flag-on-buckets.sqlcreates an index onbucket_idwithout verifying that the column exists. This column gets renamed later on so it causes issues on the second run.catalog_idcolumn doesn't exist, so if you run the migration again after clearing out the migrations table it's going to drop the fk constraint but it wont recreate it because the column is still there.New behavior
My initial logic for creating the functions in an idempotent way is tricky because
create or replaceis already idempotent but it doesn't work if the return type of the function changes like it does withstorage.get_size_by_bucket()so this needs to be dropped first. Dropping first introduces a problem though if there something else dependent on that function. If there is we can't drop it withoutcascadebut that means we would remove anything dependent on that function including RLS policies. In general I think we should avoidcascadewhere we can and rely oncreate or replaceif we make updates to a function such as changing return type we might want to add a suffix with_v2insteadAdded existence checks when creating roles (same approach as super_user)
Updated
0038-iceberg-catalog-flag-on-buckets.sqlto check forbucket_idbefore creating the index, since it’s renamed tobucket_namein migration0048Added step in CI to drop the migrations table and rerun migrations to ensure we don't generate any new migrations that are not idempotent. I run pg_dump after the initial migration, then delete the migrations table and rerun the migrations taking another pg_dump comparing the two. I ignore the
storage.migrationstable as the times the migrations took place are always going to differ.Separate the
alter table add column catalog_idfor bothiceberg_namespacesandiceberg_tablesso that the fk creation happens in a separate statement to ensure it gets created no matter how many times you run the migrations in a row by first checking if doesn't exist.Additional context