Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squash migrations #1532

Open
harrislapiroff opened this issue Nov 14, 2022 · 6 comments
Open

Squash migrations #1532

harrislapiroff opened this issue Nov 14, 2022 · 6 comments
Assignees

Comments

@harrislapiroff
Copy link
Contributor

I'm noticing that spinning up a db from scratch is pretty slow. Let's create squash migrations for our apps.

While doing so, double-check if there's any RunPython/RunSQL commands that haven't been elided that can be.

@chigby chigby self-assigned this Jan 12, 2023
@chigby
Copy link
Contributor

chigby commented Jan 17, 2023

So, I've given this a try and it's looking like the actual squashmigrations command is either not going to work for this purpose, or going to be very difficult. squashmigrations operates on a single "app" at a time, but our situation is we have multiple apps that depend on migrations having occurred in other apps, particularly in incident, common, and blog. This can happen when there are foreign keys or many-to-many relationships between models in the apps.

For example: the squashmigrations blog command succeeds, and replaces all the migrations in the blog app with a single one. This single migration has dependencies on the common app's migrations, since the relationships of the blog models require the existence of the common app's models. However, the same is true of the common app: some of its various migrations depend on some from the blog app, which has now been replaced by a single migration. If you try to start Django with this situation, it will raise a CircularDependencyError.

The migration docs say:

Note that model interdependencies in Django can get very complex, and squashing may result in migrations that do not run; either mis-optimized (in which case you can try again with --no-optimize, though you should also report an issue), or with a CircularDependencyError, in which case you can manually resolve it.

To manually resolve a CircularDependencyError, break out one of the ForeignKeys in the circular dependency loop into a separate migration, and move the dependency on the other app with it. If you’re unsure, see how makemigrations deals with the problem when asked to create brand new migrations from your models. In a future release of Django, squashmigrations will be updated to attempt to resolve these errors itself.

I did try to "break out one of the foreignkeys in the circular dependency loop" but (a) the docs don't really say what this means specifically, so I sort of assumed they mean to delete the foreign key, then squash the migration, then re-add it, this actually seemed to create more problems. Maybe I'm doing it wrong, though?

Another technique I found on stack overflow gave a more advanced version of this involving replacing any foreign keys with IntegerFields, sqashing, and then reinstating the foreign keys. I tried this on the "blog" app, which appeared to have only 3 relationships to "common". This succeeded in creating the single squashed migration but did not seem to resolve the circular dependency error. I suspect this is because "common" has foreign keys that point in the other direction.

At this point, I sort of decided it might be easier to remove all migrations and re-run makemigrations and see what happens. This seemed to go all right, and all our migrations were reduced down to 1 or 2 per app, but I did notice two things:

  1. This only worked because I was able to manually alter the django_migrations table in the DB, which we'd have to do on all the environments our site is running if we wanted to go this route. Probably somewhat tricky!
  2. The actual DB spin-up-from-scratch time was still fairly long, since this did not squash any of the wagtail or django migrations, of which there are many.

So, in conclusion, I don't really know how much time we want to devote to this, but also some of this might be my lack of familiarity with the best practices.

@harrislapiroff
Copy link
Contributor Author

Ugh, maybe it's best not to deal with this mess and just let it be as is, but the cruft bugs me as does the startup time (I want it to be cheap to throw away a dev db and start over) so let's spin this out a little further.

Let's say we go with the simpler option you reached at the end:

  1. How much did it improve startup time? I would be interested in specific times. If it really wasn't significant, it's probably not worth it for us to pursue much further. I'll swallow my feelings about cruft.

  2. I think the "proper" way to drop those database rows (insofar as anything about resetting migrations is proper) would be to run

    ./manage.py migrate --fake <app> 001

    for every app. Idk if we'd consider this a "best practice."

@chigby
Copy link
Contributor

chigby commented Jan 23, 2023

I'll collect some data on startup times.

@chigby
Copy link
Contributor

chigby commented Jan 24, 2023

So, I timed the migrate command using the following system:

  1. Start everything with docker compose up on the develop branch.

  2. Clear all tables etc. from the database using this SQL.

    DROP SCHEMA public CASCADE;
    CREATE SCHEMA public;
    
    GRANT ALL ON SCHEMA public TO public;
  3. Record how long it takes to run the migrate command from this empty state:

     $ time docker compose exec django python manage.py migrate
    

    I re-ran steps 2 and 3 a few times and it took about 33 seconds each time.

  4. Remove all migrations. I was getting some errors using the "proper" way mentioned above and I couldn't figure out how to proceed on that front. To remove all migrations I used the command:

    $ find . -path "*migrations*" -iname "0*.py" -delete
    
  5. Re-create migations: docker compose exec django python manage.py makemigrations

  6. Repeat steps 2 and 3. I did this a few times and the average runtime was 17 seconds or so.

In conclusion, it does seem like we can gain ~50% on db-initialization time using a method something like this.

@chigby
Copy link
Contributor

chigby commented Jan 31, 2023

It looks like we can simplify this method down to:

  1. Delete all migrations (see above)
  2. Create new migrations: manage.py makemigrations --name squashed
  3. "Fake" the migrations on the prod/staging DBs once the code is deployed:
    manage.py migrate --fake
    

This seems to work for me, locally, which brings the new migration state into line with the re-created migration files without having to edit the django_migrations table.

@harrislapiroff
Copy link
Contributor Author

This is not urgent, but let's kick this to the backlog and keep the ticket around so we don't lose the research.

@soleilera soleilera added someday and removed someday labels Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants