Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add live migration from postgres #2759

Merged
merged 5 commits into from
Oct 30, 2023
Merged

Add live migration from postgres #2759

merged 5 commits into from
Oct 30, 2023

Conversation

alejandrodnm
Copy link
Contributor

Description

[Short summary of why you created this PR]

Links

Fixes #[insert issue link, if any]

Writing help

For information about style and word usage, see the style guide

Review checklists

Reviewers: use this section to ensure you have checked everything before approving this PR:

Subject matter expert (SME) review checklist

  • Is the content technically accurate?
  • Is the content complete?
  • Is the content presented in a logical order?
  • Does the content use appropriate names for features and products?
  • Does the content provide relevant links to further information?

Documentation team review checklist

  • Is the content free from typos?
  • Does the content use plain English?
  • Does the content contain clear sections for concepts, tasks, and references?
  • Have any images been uploaded to the correct location, and are resolvable?
  • If the page index was updated, are redirects required
    and have they been implemented?
  • Have you checked the built version of this content?

@github-actions
Copy link

Allow 10 minutes from last push for the staging site to build. If the link doesn't work, try using incognito mode instead. For internal reviewers, check web-documentation repo actions for staging build status. Link to build for this PR: http://docs-dev.timescale.com/docs-adn-logical-decoding

migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
--source "$SOURCE" \
--target "$TARGET" \
--fail-fast \
--plugin wal2json
Copy link
Member

@JamesGuthrie JamesGuthrie Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Question for my own interest): Is wal2json available in all Postgres versions? What is the minimum Postgres version that we need for live migration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a quick read and it's an external extension, I think it's installed with the postgres-dev package. From the pgcopydb's docs I get that we can use test_decoding and that one I think comes bundle with postgres and it's the default for pgcopydb.

@arajkumar is there a reason why we are using wal2json and not the other? WDYT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug in pgcopydb while using with test_decoding which prevents migration of a table which has reserved Postgres keywords as column name.(e.g. time). I fixed it in upstream, but it is not yet released to the public.

We have to stick to wal2json until the new pgcopydb release is made available.

wal2json is usually distributed with most of the postgres distributions. Most of the cloud providers(AWS, GCP, Azure) has enabled wal2json in their managed offerings.

migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
Comment on lines 133 to 167
is optional. It can even be performed after all the data has been migrated,
since hypertables can be created from tables with existing data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be cautious with the statement. It is correct, but it misses the fact that if you convert a plain table to a hypertable after-the-fact, the table can't accept writes through the duration of the conversion process, which is basically the equivalent to having application downtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you have a bit more detailed description below. I would rework this section to say something along the lines of "while it is optional to set up hypertables before moving data, if you do it later it's going to be a pain in the ass and you're going to regret it".

--source "$SOURCE" \
--target "$TARGET" \
--snapshot $(cat /tmp/pgcopydb/snapshot) \
--split-tables-larger-than <same-table concurrency size threshold> \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this be configured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arajkumar can you check up on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--split-tables-larger-than

Allow Same-table Concurrency when processing the source database. This environment variable value is expected to be a byte size, and bytes units B, kB, MB, GB, TB, PB, and EB are known.

Reference: https://pgcopydb.readthedocs.io/en/latest/ref/pgcopydb_clone.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should set this to a default like 10GB and add a note about it. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And table-jobs to 8?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that sounds good


Before proceeding to the next step, you need to wait until the replication lag
is minimized as much as possible. For instance, you can move forward when the
`replay_lag` is close to just a few seconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the replay lag never gets close to just a few seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say you either run out of space, or remain on a state of constant lag, or you have to stop ingest to burn down the lag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this part to the next section so we can tackle all the lag nuances there.

database.

Since no new data is being ingested, `pgcopydb` can be instructed to stop the
replication process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need to be cautious here. If the replication lag doesn't go down to a few seconds, then the user must wait for it to go down here. We don't mention this, and it could lead to data loss, which we don't want.

@alejandrodnm alejandrodnm force-pushed the adn/logical-decoding branch 2 times, most recently from 7be1e36 to 5ad3056 Compare October 25, 2023 10:51
@alejandrodnm alejandrodnm force-pushed the adn/logical-decoding branch 3 times, most recently from 16f0f0b to 60dc15c Compare October 25, 2023 15:41
Copy link
Member

@JamesGuthrie JamesGuthrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there! I think the main feedback is the missing context/parent page which explains what Live migration actually is.

migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
EOF
```

- `replay_lag`: Time elapsed between flushing recent WAL locally and receiving
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the unit? Seconds? Milliseconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is interval so it's based on the intervalstyle variable, by default it's pretty printed and depending on how big it's there's more or less information:

https://www.postgresql.org/docs/16/datatype-datetime.html#DATATYPE-INTERVAL-OUTPUT

adn@/tmp:adn> select interval '1 second'
╒══════════╕
│ interval │
╞══════════╡
│ 0:00:01  │
╘══════════╛
SELECT 1
Time: 0.004s
adn@/tmp:adn> select interval '1 month'
╒══════════════════╕
│ interval         │
╞══════════════════╡
│ 30 days, 0:00:00 │
╘══════════════════╛

- `replay_lag`: Time elapsed between flushing recent WAL locally and receiving
notification that this standby server has written, flushed and applied it.

## 7. Promote target database as new primary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think that we're missing giving the user enough context about how the whole migration process works. IMO that would be the function of the parent page (Low-downtime: Live migration), which is currently absent.

Aside from the missing context, I think this section has all of the information that it needs, but I think that it needs to be reworked a little for clarity. I think that it would only make sense to do so once we've got the content for the context. Maybe some things will become clearer then.

Copy link
Contributor

@VineethReddy02 VineethReddy02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we have to update /migrate (https://docs.timescale.com/migrate/latest/) parent page with "Live migration" availability alongside dump. restore and dual writes and backfill.

migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
1. Migrate roles and schema from source to target.
1. Convert hypertables and enable Timescale features.
1. Migrate initial data from source to target.
1. Apply replication changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change this to "Apply change data capture (CDC) on target"

Copy link
Contributor Author

@alejandrodnm alejandrodnm Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never mention CDC or what does it mean. If you want to put it there as a title we'd have to explain what it is. Maybe we can change it to Apply stream of changes from source

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"stream of changes", SGTM!

migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
Copy link
Member

@arajkumar arajkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, if possible adding a note about REPLICATION IDENTITY would be great!

@alejandrodnm alejandrodnm force-pushed the adn/logical-decoding branch 2 times, most recently from 7cc2867 to 3ebf575 Compare October 30, 2023 09:06
@@ -35,8 +35,9 @@ migrating from PostgreSQL.

If you're using PostgreSQL, you may also have heard of logical replication
being the recommended strategy for migrations with low downtime. Currently,
TimescaleDB doesn't work with logical replication, so this is not a viable
option, but we are actively working on making this possible.
TimescaleDB supports logical replication only from sources that are using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't true but this PR should address this.

Copy link
Member

@arajkumar arajkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@JamesGuthrie JamesGuthrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed most of my own comments myself in my most recent commit.

migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
should be actively monitored to prevent any issues that might result due to
lack of space.

Before starting, there's an important caveat. To replicate `DELETE` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the REPLICA IDENTITY instructions be part of "Prepare the source database"?

lack of space.

Before starting, there's an important caveat. To replicate `DELETE` and
`UPDATE` operations, the source table must either have a primary key or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible extension: would be nice to have a query which returns user tables which don't have a PK, or REPLICA IDENTITY set, so that the user can easily audit their tables.

migrate/live-migration/live-migration-from-postgres.md Outdated Show resolved Hide resolved
@JamesGuthrie JamesGuthrie merged commit 8537fbe into latest Oct 30, 2023
2 checks passed
@JamesGuthrie JamesGuthrie deleted the adn/logical-decoding branch October 30, 2023 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants