A different idea for managing schema.sql
#433
Replies: 8 comments 5 replies
-
You might be interested in diff-based migration tools:
These support generating the migration from maintaining the schema file. |
Beta Was this translation helpful? Give feedback.
-
Are you checkpointing this process? That is many old migrations can be rolled up into a single first migration. |
Beta Was this translation helpful? Give feedback.
-
The shell script does this now? Or it is it lacking in some way? Or you want a single go binary? |
Beta Was this translation helpful? Give feedback.
-
I like what you are doing here- I have been thinking about similar things. To add to what this enables- it becomes possible to checkout any commit and migrate the database to what is required at that checkout. Checking in the schema-only file get close, but as stated here the "schema data" is missing. I like the pattern of using a foreign key to a small separate table as an enum- if that key is a string one gets readability and some flexibility to be able to change the enum data with DML instead of DDL. |
Beta Was this translation helpful? Give feedback.
-
I am wondering if this can be solved by having a separate schema file for the extensions? As a project grows, even without extensions the schema can get quite large- there could be some benefit to spreading the schema across multiple files. However, that gets into issue with dependencies. |
Beta Was this translation helpful? Give feedback.
-
I think it would be pretty easy to add a command line option to not do a schema-only dump and implement it in the Postgres driver and at least return an unimplemented error elsewhere. There is already a |
Beta Was this translation helpful? Give feedback.
-
Sorry, I think I am not understanding what you are saying here. It seems like either using a dump that is not schema-only or using |
Beta Was this translation helpful? Give feedback.
-
I think you can get all the functionality you need out of dbmate by adding to it in a backwards-compatible way. However, you want to have users use We have Not sure if that would work for you, at least in the short-term. In the long-term |
Beta Was this translation helpful? Give feedback.
-
Hello! 👋
First off, I want to say that I have really liked using
dbmate
and think it is the top tool I have used in this space. I appreciate all the work that has gone into it and the fact that it is has been available for the benefit of the community. So thank you. 🙏After using it on a certain project, I did end up with an idea about a different way to manage
schema.sql
, which I wanted to share and get feedback on. First, let me start by outlining some pain points I've encountered with the currentdbmate
behavior of managing theschema.sql
as an auto-generated schema-only dump:Extensions bloat the schema
Sometimes this effect can be significant, and essentially hides what a project's migrations directly manage in the database vs state external to the project managed by extension dependencies. Said another way, I don't want to add 2000 lines to the schema from an extension when the migrations I manage to create the database total 200 lines, only one of which is
CREATE EXTENSION <some_extension>;
. Those 2000 schema lines I would consider outside the concern of my project, even if my project depends on them, as my project is not maintaining them.Time-based schema changes can occur in the absence of changes to the migrations
Think things like partitioned tables. If someone comes and runs
dbmate up
in a dev database to run some tests for unrelated application code, theschema.sql
can have changes despite the fact that nothing actually changed with regard to the migrations that were run."Schema data" is omitted from schema-only dumps
A not-uncommon pattern when needing a fixed set of values to refer to in other tables is to create a table and populate it with a fixed set of rows in a migration file. Sure, enums can sometimes be used for this purpose instead, and would be part of a schema-only dump. But enums have significant limitations and downsides, and are not always adequate for all use cases. So it is not uncommon to need such a table where the data in the table is effectively part of the schema.
Unfortunately, schema-only dumps would never contain these rows, and therefore the
schema.sql
in its current form is useless for auditing such "schema data."I think the point of
schema.sql
is to provide some usable reference as to what the sum total of the migrations would produce when applied together, which is quite desirable. But for the reasons above, I am finding it difficult to useschema.sql
for this purpose.To make
schema.sql
more usable for my team, we came to the conclusion that it needs to be a manually-managed file which is, effectively, a single migration that condenses all the current migrations into a single file. Then, when making changes, we can editschema.sql
to declare what the desired state of the database should be, and then we can create a migration that performs whatever steps need to happen to move from the previous migration state to the new desired state.[0]At the end, with both the
schema.sql
changes and the new migration file in place, we can:schema.sql
schema_migrations
table to make it look like they have been appliedIf we get no differences, then the result of all the migrations matches the desired schema. If we do have differences, then we get some feedback on what is out-of-sync between the migrations and the reference schema.
By making
schema.sql
this manually-managed file, we address all of the above concerns of that file being generated as a schema-only dump, and end up with a single reference point for all state our project is concerned with enforcing in the database.We are actually using this workflow now via a hacky solution I implemented by wrapping
dbmate
in a bash script: https://gist.github.com/jkeifer/f75c65213c6a327229cf85ffa47e1efe. Effectively, the changes todbmate
ended up as follows:DBMATE_NO_DUMP_SCHEMA=true
dbmate create
is overridden to applying the referenceschema.sql
file and adding the extant migration versions after creating the database (I suppose these actions could be split into separate commands, but for convenience with this workflow in mind I kept them together)verify
command does the verification steps outlined above to diff a database from the schema with a database from the migrationsdbmate dump
is overridden to perform full dumps (as used by theverify
command) and to not by default write toschema.sql
Really, I think these changes would be rather minor to implement within
dbmate
. That said, I recognize they would likely require a non-backwards-compatible changes. And these changes could be significant for anyone usingdbmate
in a way that is: 1) dependent on its current behavior aroundschema.sql
, and 2) not running into any of the issues I mentioned. I also recognize that this idea revolves around a perhaps opinionated way of howschema.sql
should be managed, and that opinion may not be shared by all.So, after saying all that, I am looking for feedback on the above idea both generally (do people like it), and if it has a place in
dbmate
or not.I would be willing to put together a PR if the backward-compatibility concern was not a barrier or if any way to implement this behavior in a backward-compatible way were to be offered. In the event that
dbmate
turns out not to be the right place for such changes, I am also considering creating a parallel project to implement a cli for this workflow on top ofdbmate
as the backing library.Thanks for the consideration and feedback on this idea! 😄
[0] This is, in fact, not our idea. I have to credit @mwblakley as the source.
Beta Was this translation helpful? Give feedback.
All reactions