diff --git a/src/current/_includes/molt/fetch-data-load-output.md b/src/current/_includes/molt/fetch-data-load-output.md index 78a8af7a07c..eb5f4319391 100644 --- a/src/current/_includes/molt/fetch-data-load-output.md +++ b/src/current/_includes/molt/fetch-data-load-output.md @@ -1,16 +1,15 @@ 1. Check the output to observe `fetch` progress. {% if page.name == "migrate-load-replicate.md" %} -
- The following message shows the appropriate values for the `--backfillFromSCN` and `--scn` replication flags to use when [starting Replicator](#start-replicator): +
+ If you included the `--pglogical-replication-slot-name` and `--pglogical-publication-and-slot-drop-and-recreate` flags, a publication named `molt_fetch` is automatically created: - {% include_cached copy-clipboard.html %} - ~~~ - replication-only mode should include the following replicator flags: --backfillFromSCN 26685444 --scn 26685786 + ~~~ json + {"level":"info","time":"2025-02-10T14:28:11-05:00","message":"dropping and recreating publication molt_fetch"} ~~~
{% endif %} - + A `starting fetch` message indicates that the task has started:
@@ -21,18 +20,19 @@
~~~ json - {"level":"info","type":"summary","num_tables":3,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-28","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"} + {"level":"info","type":"summary","num_tables":3,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"} ~~~
~~~ json - {"level":"info","type":"summary","num_tables":3,"cdc_cursor":"26685786","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"} + {"level":"info","type":"summary","num_tables":3,"cdc_cursor":"backfillFromSCN=26685444,scn=26685786","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"} ~~~
`data extraction` messages are written for each table that is exported to the location in `--bucket-path`: +
~~~ json {"level":"info","table":"migration_schema.employees","time":"2025-02-10T14:28:11-05:00","message":"data extraction phase starting"} ~~~ @@ -40,14 +40,25 @@ ~~~ json {"level":"info","table":"migration_schema.employees","type":"summary","num_rows":200000,"export_duration_ms":1000,"export_duration":"000h 00m 01s","time":"2025-02-10T14:28:12-05:00","message":"data extraction from source complete"} ~~~ +
+ +
+ ~~~ json + {"level":"info","table":"public.employees","time":"2025-02-10T14:28:11-05:00","message":"data extraction phase starting"} + ~~~ + + ~~~ json + {"level":"info","table":"public.employees","type":"summary","num_rows":200000,"export_duration_ms":1000,"export_duration":"000h 00m 01s","time":"2025-02-10T14:28:12-05:00","message":"data extraction from source complete"} + ~~~ +
`data import` messages are written for each table that is loaded into CockroachDB: +
~~~ json {"level":"info","table":"migration_schema.employees","time":"2025-02-10T14:28:12-05:00","message":"starting data import on target"} ~~~ -
~~~ json {"level":"info","table":"migration_schema.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"0/43A1960","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"} ~~~ @@ -55,13 +66,21 @@
~~~ json - {"level":"info","table":"migration_schema.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"} + {"level":"info","table":"public.employees","time":"2025-02-10T14:28:12-05:00","message":"starting data import on target"} + ~~~ + + ~~~ json + {"level":"info","table":"public.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"} ~~~
~~~ json - {"level":"info","table":"migration_schema.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"2358840","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"} + {"level":"info","table":"migration_schema.employees","time":"2025-02-10T14:28:12-05:00","message":"starting data import on target"} + ~~~ + + ~~~ json + {"level":"info","table":"migration_schema.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"backfillFromSCN=26685444,scn=26685786","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"} ~~~
@@ -75,7 +94,7 @@
~~~ json - {"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":3,"tables":["migration_schema.employees","migration_schema.payments","migration_schema.payments"],"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"} + {"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":3,"tables":["public.employees","public.payments","public.payments"],"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"} ~~~ {% if page.name != "migrate-bulk-load.md" %} @@ -90,6 +109,16 @@
~~~ json - {"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":3,"tables":["migration_schema.employees","migration_schema.payments","migration_schema.payments"],"cdc_cursor":"2358840","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"} + {"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":3,"tables":["migration_schema.employees","migration_schema.payments","migration_schema.payments"],"cdc_cursor":"backfillFromSCN=26685444,scn=26685786","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"} ~~~ + + {% if page.name != "migrate-bulk-load.md" %} + This message shows the appropriate values for the `--backfillFromSCN` and `--scn` flags to use when [starting Replicator](#start-replicator): + + {% include_cached copy-clipboard.html %} + ~~~ + --backfillFromSCN 26685444 + --scn 26685786 + ~~~ + {% endif %}
\ No newline at end of file diff --git a/src/current/_includes/molt/fetch-metrics.md b/src/current/_includes/molt/fetch-metrics.md index b4c37bb4cd9..471319ad533 100644 --- a/src/current/_includes/molt/fetch-metrics.md +++ b/src/current/_includes/molt/fetch-metrics.md @@ -17,7 +17,7 @@ Cockroach Labs recommends monitoring the following metrics during data load: You can also use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view the preceding metrics. {% if page.name != "migrate-bulk-load.md" %} -{{site.data.alerts.callout_info}} -Metrics from the `replicator` process are enabled by setting the `--metricsAddr` [replication flag](#replication-flags), and are served at `http://{host}:{port}/_/varz`.
To view Oracle-specific metrics from `replicator`, import [this Grafana dashboard](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json).
+{{site.data.alerts.callout_success}} +For details on Replicator metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}). {{site.data.alerts.end}} {% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/fetch-schema-table-filtering.md b/src/current/_includes/molt/fetch-schema-table-filtering.md index 1f44ad31248..108fbfeab4a 100644 --- a/src/current/_includes/molt/fetch-schema-table-filtering.md +++ b/src/current/_includes/molt/fetch-schema-table-filtering.md @@ -1,14 +1,23 @@ -MOLT Fetch can restrict which schemas (or users) and tables are migrated by using the following filter flags: +Use the following flags to filter the data to be migrated: +
+| Filter type | Flag | Description | +|------------------------|----------------------------|--------------------------------------------------------------------------| +| Table filter | `--table-filter` | POSIX regex matching table names to include across all selected schemas. | +| Table exclusion filter | `--table-exclusion-filter` | POSIX regex matching table names to exclude across all selected schemas. | + +{{site.data.alerts.callout_info}} +`--schema-filter` does not apply to MySQL sources because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. +{{site.data.alerts.end}} +
+ +
| Filter type | Flag | Description | |------------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------| | Schema filter | `--schema-filter` | [POSIX regex](https://wikipedia.org/wiki/Regular_expression) matching schema names to include; all matching schemas and their tables are moved. | | Table filter | `--table-filter` | POSIX regex matching table names to include across all selected schemas. | | Table exclusion filter | `--table-exclusion-filter` | POSIX regex matching table names to exclude across all selected schemas. | - -{{site.data.alerts.callout_success}} -Use `--schema-filter` to migrate only the specified schemas, and refine which tables are moved using `--table-filter` or `--table-exclusion-filter`. -{{site.data.alerts.end}} +
When migrating from Oracle, you **must** include `--schema-filter` to name an Oracle schema to migrate. This prevents Fetch from attempting to load tables owned by other users. For example: @@ -19,11 +28,91 @@ When migrating from Oracle, you **must** include `--schema-filter` to name an Or
{% if page.name != "migrate-bulk-load.md" %} -
-{% include molt/fetch-table-filter-userscript.md %} +
+#### Table filter userscript + +When loading a subset of tables using `--table-filter`, you **must** provide a TypeScript userscript to specify which tables to replicate. + +For example, the following `table_filter.ts` userscript filters change events to the specified source tables: + +~~~ ts +import * as api from "replicator@v1"; + +// List the source tables (matching source names and casing) to include in replication +const allowedTables = ["EMPLOYEES", "PAYMENTS", "ORDERS"]; + +// Update this to your target CockroachDB database and schema name +api.configureSource("molt.migration_schema", { + dispatch: (doc: Document, meta: Document): Record | null => { + // Replicate only if the table matches one of the allowed tables + if (allowedTables.includes(meta.table)) { + let ret: Record = {}; + ret[meta.table] = [doc]; + return ret; + } + // Ignore all other tables + return null; + }, + deletesTo: (doc: Document, meta: Document): Record | null => { + // Optionally filter deletes the same way + if (allowedTables.includes(meta.table)) { + let ret: Record = {}; + ret[meta.table] = [doc]; + return ret; + } + return null; + }, +}); +~~~ + +Pass the userscript to MOLT Replicator with the `--userscript` [flag](#replicator-flags): + +~~~ +--userscript table_filter.ts +~~~
-
-{% include molt/fetch-table-filter-userscript.md %} +
+#### Table filter userscript + +When loading a subset of tables using `--table-filter`, you **must** provide a TypeScript userscript to specify which tables to replicate. + +For example, the following `table_filter.ts` userscript filters change events to the specified source tables: + +~~~ ts +import * as api from "replicator@v1"; + +// List the source tables (matching source names and casing) to include in replication +const allowedTables = ["EMPLOYEES", "PAYMENTS", "ORDERS"]; + +// Update this to your target CockroachDB database and schema name +api.configureSource("molt.public", { + dispatch: (doc: Document, meta: Document): Record | null => { + // Replicate only if the table matches one of the allowed tables + if (allowedTables.includes(meta.table)) { + let ret: Record = {}; + ret[meta.table] = [doc]; + return ret; + } + // Ignore all other tables + return null; + }, + deletesTo: (doc: Document, meta: Document): Record | null => { + // Optionally filter deletes the same way + if (allowedTables.includes(meta.table)) { + let ret: Record = {}; + ret[meta.table] = [doc]; + return ret; + } + return null; + }, +}); +~~~ + +Pass the userscript to MOLT Replicator with the `--userscript` [flag](#replicator-flags): + +~~~ +--userscript table_filter.ts +~~~
{% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/fetch-table-filter-userscript.md b/src/current/_includes/molt/fetch-table-filter-userscript.md deleted file mode 100644 index 5a2b5abc0ff..00000000000 --- a/src/current/_includes/molt/fetch-table-filter-userscript.md +++ /dev/null @@ -1,41 +0,0 @@ -#### Table filter userscript - -When loading a subset of tables using `--table-filter`, you **must** provide a TypeScript userscript to specify which tables to replicate. - -For example, the following `table_filter.ts` userscript filters change events to the specified source tables: - -~~~ ts -import * as api from "replicator@v1"; - -// List the source tables (matching source names and casing) to include in replication -const allowedTables = ["EMPLOYEES", "PAYMENTS", "ORDERS"]; - -// Update this to your target CockroachDB database and schema name -api.configureSource("defaultdb.migration_schema", { - dispatch: (doc: Document, meta: Document): Record | null => { - // Replicate only if the table matches one of the allowed tables - if (allowedTables.includes(meta.table)) { - let ret: Record = {}; - ret[meta.table] = [doc]; - return ret; - } - // Ignore all other tables - return null; - }, - deletesTo: (doc: Document, meta: Document): Record | null => { - // Optionally filter deletes the same way - if (allowedTables.includes(meta.table)) { - let ret: Record = {}; - ret[meta.table] = [doc]; - return ret; - } - return null; - }, -}); -~~~ - -Pass the userscript to MOLT Replicator with the `--userscript` [flag](#replication-flags): - -~~~ ---userscript table_filter.ts -~~~ \ No newline at end of file diff --git a/src/current/_includes/molt/migration-create-sql-user.md b/src/current/_includes/molt/migration-create-sql-user.md index 5062113e8ed..dd2e078e3a4 100644 --- a/src/current/_includes/molt/migration-create-sql-user.md +++ b/src/current/_includes/molt/migration-create-sql-user.md @@ -14,7 +14,7 @@ Grant database-level privileges for schema creation within the target database: GRANT ALL ON DATABASE defaultdb TO crdb_user; ~~~ -Grant user privileges to create internal MOLT tables like `_molt_fetch_exceptions` in the public schema: +Grant user privileges to create internal MOLT tables like `_molt_fetch_exceptions` in the `public` CockroachDB schema: {{site.data.alerts.callout_info}} Ensure that you are connected to the target database. @@ -25,8 +25,9 @@ Ensure that you are connected to the target database. GRANT CREATE ON SCHEMA public TO crdb_user; ~~~ -If you manually created the target schema (i.e., [`drop-on-target-and-recreate`](#table-handling-mode) will not be used), grant the following privileges on the schema: +If you manually defined the target tables (which means that [`drop-on-target-and-recreate`](#table-handling-mode) will not be used), grant the following privileges on the schema: +
{% include_cached copy-clipboard.html %} ~~~ sql GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA migration_schema TO crdb_user; @@ -34,7 +35,8 @@ ALTER DEFAULT PRIVILEGES IN SCHEMA migration_schema GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO crdb_user; ~~~ -Grant the same privileges for internal MOLT tables: +Grant the same privileges for internal MOLT tables in the `public` CockroachDB schema: +
{% include_cached copy-clipboard.html %} ~~~ sql @@ -47,18 +49,27 @@ Depending on the MOLT Fetch [data load mode](#data-load-mode) you will use, gran #### `IMPORT INTO` privileges -Grant `SELECT`, `INSERT`, and `DROP` (required because the table is taken offline during the `IMPORT INTO`) privileges on all tables in the [target schema](#create-the-target-schema): +Grant `SELECT`, `INSERT`, and `DROP` (required because the table is taken offline during the `IMPORT INTO`) privileges on all tables being migrated: +
{% include_cached copy-clipboard.html %} ~~~ sql GRANT SELECT, INSERT, DROP ON ALL TABLES IN SCHEMA migration_schema TO crdb_user; ~~~ +
+ +
+{% include_cached copy-clipboard.html %} +~~~ sql +GRANT SELECT, INSERT, DROP ON ALL TABLES IN SCHEMA public TO crdb_user; +~~~ +
If you plan to use [cloud storage with implicit authentication](#cloud-storage-authentication) for data load, grant the `EXTERNALIOIMPLICITACCESS` [system-level privilege]({% link {{site.current_cloud_version}}/security-reference/authorization.md %}#supported-privileges): {% include_cached copy-clipboard.html %} ~~~ sql -GRANT EXTERNALIOIMPLICITACCESS TO crdb_user; +GRANT SYSTEM EXTERNALIOIMPLICITACCESS TO crdb_user; ~~~ #### `COPY FROM` privileges diff --git a/src/current/_includes/molt/migration-prepare-database.md b/src/current/_includes/molt/migration-prepare-database.md index e3bbb27c183..8b502f03e03 100644 --- a/src/current/_includes/molt/migration-prepare-database.md +++ b/src/current/_includes/molt/migration-prepare-database.md @@ -12,7 +12,7 @@ Grant the user privileges to connect, view schema objects, and select the tables {% include_cached copy-clipboard.html %} ~~~ sql -GRANT CONNECT ON DATABASE source_database TO migration_user; +GRANT CONNECT ON DATABASE migration_db TO migration_user; GRANT USAGE ON SCHEMA migration_schema TO migration_user; GRANT SELECT ON ALL TABLES IN SCHEMA migration_schema TO migration_user; ALTER DEFAULT PRIVILEGES IN SCHEMA migration_schema GRANT SELECT ON TABLES TO migration_user; @@ -31,7 +31,7 @@ Alternatively, grant the following permissions to create replication slots, acce {% include_cached copy-clipboard.html %} ~~~ sql ALTER USER migration_user WITH LOGIN REPLICATION; -GRANT CREATE ON DATABASE source_database TO migration_user; +GRANT CREATE ON DATABASE migration_db TO migration_user; ALTER TABLE migration_schema.table_name OWNER TO migration_user; ~~~ @@ -45,11 +45,12 @@ Run the `ALTER TABLE` command for each table to replicate. CREATE USER 'migration_user'@'%' IDENTIFIED BY 'password'; ~~~ -Grant the user privileges to select only the tables you migrate: +Grant the user privileges to select the tables you migrate and access GTID information for snapshot consistency: {% include_cached copy-clipboard.html %} ~~~ sql -GRANT SELECT ON source_database.* TO 'migration_user'@'%'; +GRANT SELECT ON migration_db.* TO 'migration_user'@'%'; +GRANT SELECT ON mysql.gtid_executed TO 'migration_user'@'%'; FLUSH PRIVILEGES; ~~~ @@ -74,7 +75,7 @@ CREATE USER MIGRATION_USER IDENTIFIED BY 'password'; When migrating from Oracle Multitenant (PDB/CDB), this should be a [common user](https://docs.oracle.com/database/121/ADMQS/GUID-DA54EBE5-43EF-4B09-B8CC-FAABA335FBB8.htm). Prefix the username with `C##` (e.g., `C##MIGRATION_USER`). {{site.data.alerts.end}} -Grant the user privileges to connect, read metadata, and `SELECT` and `FLASHBACK` the tables you plan to migrate. The tables should all reside in a single schema (e.g., `migration_schema`). For details, refer to [Schema and table filtering](#schema-and-table-filtering). +Grant the user privileges to connect, read metadata, and `SELECT` and `FLASHBACK` the tables you plan to migrate. The tables should all reside in a single schema (for example, `migration_schema`). For details, refer to [Schema and table filtering](#schema-and-table-filtering). ##### Oracle Multitenant (PDB/CDB) user privileges @@ -164,7 +165,20 @@ Connect to the primary instance (PostgreSQL primary, MySQL primary/master, or Or {{site.data.alerts.end}}
-Verify that you are connected to the primary server by running `SELECT pg_is_in_recovery();` and getting a `false` result. +Verify that you are connected to the primary server: + +{% include_cached copy-clipboard.html %} +~~~ sql +SELECT pg_is_in_recovery(); +~~~ + +You should get a false result: + +~~~ + pg_is_in_recovery +------------------- + f +~~~ Enable logical replication by setting `wal_level` to `logical` in `postgresql.conf` or in the SQL shell. For example: @@ -243,11 +257,11 @@ SELECT force_logging FROM v$database; -- Expected: YES ##### Create source sentinel table -Create a checkpoint table called `_replicator_sentinel` in the Oracle schema you will migrate: +Create a checkpoint table called `REPLICATOR_SENTINEL` in the Oracle schema you will migrate: {% include_cached copy-clipboard.html %} ~~~ sql -CREATE TABLE migration_schema."_replicator_sentinel" ( +CREATE TABLE migration_schema."REPLICATOR_SENTINEL" ( keycol NUMBER PRIMARY KEY, lastSCN NUMBER ); @@ -257,7 +271,7 @@ Grant privileges to modify the checkpoint table. In Oracle Multitenant, grant th {% include_cached copy-clipboard.html %} ~~~ sql -GRANT SELECT, INSERT, UPDATE ON migration_schema."_replicator_sentinel" TO C##MIGRATION_USER; +GRANT SELECT, INSERT, UPDATE ON migration_schema."REPLICATOR_SENTINEL" TO C##MIGRATION_USER; ~~~ ##### Grant LogMiner privileges @@ -288,7 +302,7 @@ The user must: - Query [redo logs from LogMiner](#verify-logminer-privileges). - Retrieve active transaction information to determine the starting point for ongoing replication. -- Update the internal [`_replicator_sentinel` table](#create-source-sentinel-table) created on the Oracle source schema by the DBA. +- Update the internal [`REPLICATOR_SENTINEL` table](#create-source-sentinel-table) created on the Oracle source schema by the DBA. ##### Verify LogMiner privileges diff --git a/src/current/_includes/molt/migration-prepare-schema.md b/src/current/_includes/molt/migration-prepare-schema.md index bf84bfb4955..591550b02ba 100644 --- a/src/current/_includes/molt/migration-prepare-schema.md +++ b/src/current/_includes/molt/migration-prepare-schema.md @@ -2,13 +2,13 @@ #### Schema Conversion Tool -The [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) (SCT) automates target schema creation. It requires a free [CockroachDB {{ site.data.products.cloud }} account]({% link cockroachcloud/create-an-account.md %}). +The [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) (SCT) converts source table definitions to CockroachDB-compatible syntax. It requires a free [CockroachDB {{ site.data.products.cloud }} account]({% link cockroachcloud/create-an-account.md %}). -1. Upload a source `.sql` file to convert the syntax and identify [unimplemented features and syntax incompatibilities]({% link molt/migration-strategy.md %}#unimplemented-features-and-syntax-incompatibilities) in the schema. +1. Upload a source `.sql` file to convert the syntax and identify [unimplemented features and syntax incompatibilities]({% link molt/migration-strategy.md %}#unimplemented-features-and-syntax-incompatibilities) in the table definitions. -1. Import the converted schema to a CockroachDB cluster: - - When migrating to CockroachDB {{ site.data.products.cloud }}, the Schema Conversion Tool automatically [applies the converted schema to a new {{ site.data.products.cloud }} database]({% link cockroachcloud/migrations-page.md %}#migrate-the-schema). - - When migrating to a {{ site.data.products.core }} CockroachDB cluster, [export a converted schema file]({% link cockroachcloud/migrations-page.md %}#export-the-schema) and pipe the [data definition language (DDL)]({% link {{ site.current_cloud_version }}/sql-statements.md %}#data-definition-statements) directly into [`cockroach sql`]({% link {{ site.current_cloud_version }}/cockroach-sql.md %}). +1. Import the converted table definitions to a CockroachDB cluster: + - When migrating to CockroachDB {{ site.data.products.cloud }}, the Schema Conversion Tool automatically [applies the converted table definitions to a new {{ site.data.products.cloud }} database]({% link cockroachcloud/migrations-page.md %}#migrate-the-schema). + - When migrating to a {{ site.data.products.core }} CockroachDB cluster, [export a converted DDL file]({% link cockroachcloud/migrations-page.md %}#export-the-schema) and pipe the [data definition language (DDL)]({% link {{ site.current_cloud_version }}/sql-statements.md %}#data-definition-statements) directly into [`cockroach sql`]({% link {{ site.current_cloud_version }}/cockroach-sql.md %}).
Syntax that cannot automatically be converted will be displayed in the [**Summary Report**]({% link cockroachcloud/migrations-page.md %}?filters=mysql#summary-report). These may include the following: @@ -28,7 +28,7 @@ Identifiers are case-sensitive in MySQL and [case-insensitive in CockroachDB]({% The MySQL [`AUTO_INCREMENT`](https://dev.mysql.com/doc/refman/8.0/en/example-auto-increment.html) attribute, which creates sequential column values, is not supported in CockroachDB. When [using the Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}?filters=mysql#convert-a-schema), columns with `AUTO_INCREMENT` can be converted to use [sequences]({% link {{ site.current_cloud_version }}/create-sequence.md %}), `UUID` values with [`gen_random_uuid()`]({% link {{ site.current_cloud_version }}/functions-and-operators.md %}#id-generation-functions), or unique `INT8` values using [`unique_rowid()`]({% link {{ site.current_cloud_version }}/functions-and-operators.md %}#id-generation-functions). Cockroach Labs does not recommend using a sequence to define a primary key column. For more information, refer to [Unique ID best practices]({% link {{ site.current_cloud_version }}/performance-best-practices-overview.md %}#unique-id-best-practices). {{site.data.alerts.callout_info}} -Changing a column type during schema conversion will cause [MOLT Verify]({% link molt/molt-verify.md %}) to identify a type mismatch during data validation. This is expected behavior. +Changing a column type during table definition conversion will cause [MOLT Verify]({% link molt/molt-verify.md %}) to identify a type mismatch during data validation. This is expected behavior. {{site.data.alerts.end}} ##### `ENUM` type diff --git a/src/current/_includes/molt/migration-schema-design-practices.md b/src/current/_includes/molt/migration-schema-design-practices.md index be799393f6f..644fad6de81 100644 --- a/src/current/_includes/molt/migration-schema-design-practices.md +++ b/src/current/_includes/molt/migration-schema-design-practices.md @@ -1,18 +1,39 @@ -Convert the source schema into a CockroachDB-compatible schema. CockroachDB supports the PostgreSQL wire protocol and is largely [compatible with PostgreSQL syntax]({% link {{ site.current_cloud_version }}/postgresql-compatibility.md %}#features-that-differ-from-postgresql). +Convert the source table definitions into CockroachDB-compatible equivalents. CockroachDB supports the PostgreSQL wire protocol and is largely [compatible with PostgreSQL syntax]({% link {{ site.current_cloud_version }}/postgresql-compatibility.md %}#features-that-differ-from-postgresql). -- The source and target schemas must **match**. Review [Type mapping]({% link molt/molt-fetch.md %}#type-mapping) to understand which source types can be mapped to CockroachDB types. +- The source and target table definitions must **match**. Review [Type mapping]({% link molt/molt-fetch.md %}#type-mapping) to understand which source types can be mapped to CockroachDB types. - For example, a source table defined as `CREATE TABLE migration_schema.tbl (pk INT PRIMARY KEY);` must have a corresponding schema and table in CockroachDB: +
+ For example, a PostgreSQL source table defined as `CREATE TABLE migration_schema.tbl (pk INT PRIMARY KEY);` must have a corresponding schema and table in CockroachDB: {% include_cached copy-clipboard.html %} ~~~ sql CREATE SCHEMA migration_schema; CREATE TABLE migration_schema.tbl (pk INT PRIMARY KEY); ~~~ +
+ +
+ MySQL tables belong directly to the database specified in the connection string. A MySQL source table defined as `CREATE TABLE tbl (id INT PRIMARY KEY);` should map to CockroachDB's default `public` schema: + + {% include_cached copy-clipboard.html %} + ~~~ sql + CREATE TABLE tbl (id INT PRIMARY KEY); + ~~~ +
+ +
+ For example, an Oracle source table defined as `CREATE TABLE migration_schema.tbl (pk INT PRIMARY KEY);` must have a corresponding schema and table in CockroachDB: + + {% include_cached copy-clipboard.html %} + ~~~ sql + CREATE SCHEMA migration_schema; + CREATE TABLE migration_schema.tbl (pk INT PRIMARY KEY); + ~~~ +
- - MOLT Fetch can automatically create a matching CockroachDB schema using the {% if page.name != "migration-strategy.md" %}[`drop-on-target-and-recreate`](#table-handling-mode){% else %}[`drop-on-target-and-recreate`]({% link molt/molt-fetch.md %}#target-table-handling){% endif %} option. + - MOLT Fetch can automatically define matching CockroachDB tables using the {% if page.name != "migration-strategy.md" %}[`drop-on-target-and-recreate`](#table-handling-mode){% else %}[`drop-on-target-and-recreate`]({% link molt/molt-fetch.md %}#target-table-handling){% endif %} option. - - If you create the target schema manually, review how MOLT Fetch handles [type mismatches]({% link molt/molt-fetch.md %}#mismatch-handling). You can use the {% if page.name != "migration-strategy.md" %}[MOLT Schema Conversion Tool](#schema-conversion-tool){% else %}[MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}){% endif %} to create a matching schema. + - If you define the target tables manually, review how MOLT Fetch handles [type mismatches]({% link molt/molt-fetch.md %}#mismatch-handling). You can use the {% if page.name != "migration-strategy.md" %}[MOLT Schema Conversion Tool](#schema-conversion-tool){% else %}[MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}){% endif %} to create matching table definitions.
- By default, table and column names are case-insensitive in MOLT Fetch. If using the [`--case-sensitive`]({% link molt/molt-fetch.md %}#global-flags) flag, schema, table, and column names must match Oracle's default uppercase identifiers. Use quoted names on the target to preserve case. For example, the following CockroachDB SQL statement will error: diff --git a/src/current/_includes/molt/migration-stop-replication.md b/src/current/_includes/molt/migration-stop-replication.md index feb97edff1d..613805d6473 100644 --- a/src/current/_includes/molt/migration-stop-replication.md +++ b/src/current/_includes/molt/migration-stop-replication.md @@ -4,7 +4,7 @@ 1. Wait for replication to drain, which means that all transactions that occurred on the source database have been fully processed and replicated to CockroachDB. There are two ways to determine that replication has fully drained: - When replication is caught up, you will not see new `upserted rows` logs. - - If you set up the replication metrics endpoint with `--metricsAddr` in the preceding steps, metrics are available at: + - If you set up the replication metrics endpoint with [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) in the preceding steps, metrics are available at: ~~~ http://{host}:{port}/_/varz diff --git a/src/current/_includes/molt/molt-connection-strings.md b/src/current/_includes/molt/molt-connection-strings.md index a180ab210b2..c865d976b4d 100644 --- a/src/current/_includes/molt/molt-connection-strings.md +++ b/src/current/_includes/molt/molt-connection-strings.md @@ -18,7 +18,7 @@ The source connection **must** point to the primary instance (PostgreSQL primary For example: ~~~ ---source 'postgres://migration_user:password@localhost:5432/molt?sslmode=verify-full' +--source 'postgres://migration_user:password@localhost:5432/migration_db?sslmode=verify-full' ~~~
@@ -30,7 +30,7 @@ For example: For example: ~~~ ---source 'mysql://migration_user:password@localhost/molt?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt' +--source 'mysql://migration_user:password@localhost/migration_db?sslcert=.%2fsource_certs%2fclient.root.crt&sslkey=.%2fsource_certs%2fclient.root.key&sslmode=verify-full&sslrootcert=.%2fsource_certs%2fca.crt' ~~~
diff --git a/src/current/_includes/molt/molt-docker.md b/src/current/_includes/molt/molt-docker.md index 0e02705974b..96b84904c64 100644 --- a/src/current/_includes/molt/molt-docker.md +++ b/src/current/_includes/molt/molt-docker.md @@ -67,13 +67,13 @@ When testing locally, specify the host as follows: {% if page.name == "molt-replicator.md" %} ~~~ - --sourceConn 'postgres://postgres:postgres@host.docker.internal:5432/molt?sslmode=disable' - --targetConn "postgres://root@host.docker.internal:26257/molt?sslmode=disable" + --sourceConn 'postgres://postgres:postgres@host.docker.internal:5432/migration_db?sslmode=disable' + --targetConn "postgres://root@host.docker.internal:26257/defaultdb?sslmode=disable" ~~~ {% else %} ~~~ - --source 'postgres://postgres:postgres@host.docker.internal:5432/molt?sslmode=disable' - --target "postgres://root@host.docker.internal:26257/molt?sslmode=disable" + --source 'postgres://postgres:postgres@host.docker.internal:5432/migration_db?sslmode=disable' + --target "postgres://root@host.docker.internal:26257/defaultdb?sslmode=disable" ~~~ {% endif %} @@ -81,12 +81,12 @@ When testing locally, specify the host as follows: {% if page.name == "molt-replicator.md" %} ~~~ - --sourceConn 'postgres://postgres:postgres@172.17.0.1:5432/molt?sslmode=disable' - --targetConn "postgres://root@172.17.0.1:26257/molt?sslmode=disable" + --sourceConn 'postgres://postgres:postgres@172.17.0.1:5432/migration_db?sslmode=disable' + --targetConn "postgres://root@172.17.0.1:26257/defaultdb?sslmode=disable" ~~~ {% else %} ~~~ - --source 'postgres://postgres:postgres@172.17.0.1:5432/molt?sslmode=disable' - --target "postgres://root@172.17.0.1:26257/molt?sslmode=disable" + --source 'postgres://postgres:postgres@172.17.0.1:5432/migration_db?sslmode=disable' + --target "postgres://root@172.17.0.1:26257/defaultdb?sslmode=disable" ~~~ {% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/molt-drop-constraints-indexes.md b/src/current/_includes/molt/molt-drop-constraints-indexes.md index a6456d779fa..c360991eff5 100644 --- a/src/current/_includes/molt/molt-drop-constraints-indexes.md +++ b/src/current/_includes/molt/molt-drop-constraints-indexes.md @@ -24,5 +24,5 @@ To optimize data load performance, drop all non-`PRIMARY KEY` [constraints]({% l Do **not** drop [`PRIMARY KEY`]({% link {{ site.current_cloud_version }}/primary-key.md %}) constraints. {{site.data.alerts.end}} -You can [recreate the constraints and indexes after loading the data](#modify-the-cockroachdb-schema). +You can [recreate the constraints and indexes after loading the data](#add-constraints-and-indexes). {% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/molt-install.md b/src/current/_includes/molt/molt-install.md index 86b224ba26f..8bfe092832a 100644 --- a/src/current/_includes/molt/molt-install.md +++ b/src/current/_includes/molt/molt-install.md @@ -20,6 +20,10 @@ For ease of use, keep both `molt` and `replicator` in your current working direc To display the current version of each binary, run `molt --version` and `replicator --version`. +{{site.data.alerts.callout_info}} +`molt` is bundled with the latest `replicator` version available at the time of the MOLT release. This means that the MOLT download always contains the latest released version of [MOLT Replicator]({% link molt/molt-replicator.md %}). To verify that the `molt` and `replicator` versions match, run `molt --version` and `replicator --version`. +{{site.data.alerts.end}} + For previous binaries, refer to the [MOLT version manifest](https://molt.cockroachdb.com/molt/cli/versions.html). {% if page.name != "molt.md" %}For release details, refer to the [MOLT changelog]({% link releases/molt.md %}).{% endif %} {% if page.name == "molt-fetch.md" or page.name == "molt.md" %} diff --git a/src/current/_includes/molt/molt-secure-connection-strings.md b/src/current/_includes/molt/molt-secure-connection-strings.md index fb9e82a01b5..554e759d9a5 100644 --- a/src/current/_includes/molt/molt-secure-connection-strings.md +++ b/src/current/_includes/molt/molt-secure-connection-strings.md @@ -5,8 +5,8 @@ - Provide your connection strings as environment variables. For example: ~~~ shell - export SOURCE="postgres://migration_user:a%2452%26@localhost:5432/molt?sslmode=verify-full" - export TARGET="postgres://root@localhost:26257/molt?sslmode=verify-full" + export SOURCE="postgres://migration_user:a%2452%26@localhost:5432/migration_db?sslmode=verify-full" + export TARGET="postgres://root@localhost:26257/defaultdb?sslmode=verify-full" ~~~ Afterward, reference the environment variables in MOLT commands: @@ -31,7 +31,7 @@ {% include_cached copy-clipboard.html %} ~~~ - postgresql://migration_user@db.example.com:5432/appdb?sslmode=verify-full&sslrootcert=/etc/molt/certs/ca.pem&sslcert=/etc/molt/certs/client.crt&sslkey=/etc/molt/certs/client.key + postgresql://migration_user@db.example.com:5432/appdb?sslmode=verify-full&sslrootcert=/etc/migration_db/certs/ca.pem&sslcert=/etc/migration_db/certs/client.crt&sslkey=/etc/migration_db/certs/client.key ~~~ - URL-encode connection strings for the source database and [CockroachDB]({% link {{site.current_cloud_version}}/connect-to-the-database.md %}) so special characters in passwords are handled correctly. @@ -46,7 +46,7 @@ Use the encoded password in your connection string. For example: ~~~ - postgres://migration_user:a%2452%26@localhost:5432/replicationload + postgres://migration_user:a%2452%26@localhost:5432/migration_db ~~~ - Remove `sslmode=disable` from production connection strings. \ No newline at end of file diff --git a/src/current/_includes/molt/molt-setup.md b/src/current/_includes/molt/molt-setup.md index 00f9f642a3b..47b72132284 100644 --- a/src/current/_includes/molt/molt-setup.md +++ b/src/current/_includes/molt/molt-setup.md @@ -29,7 +29,7 @@ ## Prepare the target database -### Create the target schema +### Define the target tables {% include molt/migration-prepare-schema.md %} @@ -74,6 +74,12 @@ When you run `molt fetch`, you can configure the following options for data load - [Data load mode](#data-load-mode): Choose between `IMPORT INTO` and `COPY FROM`. - [Fetch metrics](#fetch-metrics): Configure metrics collection during initial data load. +
+ + + +
+ ### Connection strings {% include molt/molt-connection-strings.md %} diff --git a/src/current/_includes/molt/molt-troubleshooting-failback.md b/src/current/_includes/molt/molt-troubleshooting-failback.md index 1667cc4c690..26d68f40da9 100644 --- a/src/current/_includes/molt/molt-troubleshooting-failback.md +++ b/src/current/_includes/molt/molt-troubleshooting-failback.md @@ -5,7 +5,7 @@ If the changefeed shows connection errors in `SHOW CHANGEFEED JOB`: ##### Connection refused ~~~ -transient error: Post "https://replicator-host:30004/molt/public": dial tcp [::1]:30004: connect: connection refused +transient error: Post "https://replicator-host:30004/migration_db/migration_schema": dial tcp [::1]:30004: connect: connection refused ~~~ This indicates that Replicator is down, the webhook URL is incorrect, or the port is misconfigured. @@ -24,9 +24,17 @@ The webhook URL path is specified in the `INTO` clause when you [create the chan **Resolution:** Verify the webhook path format matches your target database type: -- PostgreSQL or CockroachDB targets: Use `/database/schema` format. For example, `webhook-https://replicator-host:30004/migration_schema/public`. -- MySQL targets: Use `/database` format (schema is implicit). For example, `webhook-https://replicator-host:30004/migration_schema`. -- Oracle targets: Use `/DATABASE` format in uppercase. For example, `webhook-https://replicator-host:30004/MIGRATION_SCHEMA`. +
+- PostgreSQL targets should use `/database/schema` format. For example, `webhook-https://replicator-host:30004/migration_db/migration_schema`. +
+ +
+- MySQL targets should use `/database` format. For example, `webhook-https://replicator-host:30004/migration_db`. +
+ +
+- Oracle targets should use `/SCHEMA` format in uppercase. For example, `webhook-https://replicator-host:30004/MIGRATION_SCHEMA`. +
For details on configuring the webhook sink URI, refer to [Webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink). diff --git a/src/current/_includes/molt/molt-troubleshooting-fetch.md b/src/current/_includes/molt/molt-troubleshooting-fetch.md index 8c9d26031d4..31c20563098 100644 --- a/src/current/_includes/molt/molt-troubleshooting-fetch.md +++ b/src/current/_includes/molt/molt-troubleshooting-fetch.md @@ -21,6 +21,34 @@ When run in `none` or `truncate-if-exists` mode, `molt fetch` exits early in the - A source column has a `NOT NULL` constraint, and the corresponding target column is nullable (i.e., the constraint is less strict on the target). - A [`DEFAULT`]({% link {{site.current_cloud_version}}/default-value.md %}), [`CHECK`]({% link {{site.current_cloud_version}}/check.md %}), [`FOREIGN KEY`]({% link {{site.current_cloud_version}}/foreign-key.md %}), or [`UNIQUE`]({% link {{site.current_cloud_version}}/unique.md %}) constraint is specified on a target column and not on the source column. +
+##### Failed to export snapshot: no rows in result set + +~~~ +failed to export snapshot: please ensure that you have GTID-based replication enabled: sql: no rows in result set +~~~ + +This typically occurs on a new MySQL cluster that has not had any writes committed. The GTID set will not appear in `SHOW MASTER STATUS` until at least one transaction has been committed on the database. + +**Resolution:** Execute a minimal transaction to initialize the GTID set: + +{% include_cached copy-clipboard.html %} +~~~ sql +START TRANSACTION; +SELECT 1; +COMMIT; +~~~ + +Verify that the GTID set now appears: + +{% include_cached copy-clipboard.html %} +~~~ sql +SHOW MASTER STATUS; +~~~ + +This should return a valid GTID value instead of an empty result. +
+
##### ORA-01950: no privileges on tablespace diff --git a/src/current/_includes/molt/molt-troubleshooting-replication.md b/src/current/_includes/molt/molt-troubleshooting-replication.md index d7d8bd2fe34..b9ec1fd3a29 100644 --- a/src/current/_includes/molt/molt-troubleshooting-replication.md +++ b/src/current/_includes/molt/molt-troubleshooting-replication.md @@ -6,7 +6,7 @@ If MOLT Replicator appears hung or performs poorly: 1. Enable trace logging with `-vv` to get more visibility into the replicator's state and behavior. -1. If MOLT Replicator is in an unknown, hung, or erroneous state, collect performance profiles to include with support tickets. Replace `{host}` and `{metrics-port}` with your Replicator host and the port specified by `--metricsAddr`: +1. If MOLT Replicator is in an unknown, hung, or erroneous state, collect performance profiles to include with support tickets. Replace `{host}` and `{metrics-port}` with your Replicator host and the port specified by [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr): {% include_cached copy-clipboard.html %} ~~~shell @@ -73,7 +73,7 @@ SELECT pg_create_logical_replication_slot('molt_slot', 'pgoutput'); ##### Could not connect to PostgreSQL ~~~ -could not connect to source database: failed to connect to `user=migration_user database=source_database` +could not connect to source database: failed to connect to `user=migration_user database=migration_db` ~~~ **Resolution:** Verify the connection details including user, host, port, and database name. Ensure the database name in your `--sourceConn` connection string matches exactly where you created the publication and slot. Verify you're connecting to the same host and port where you ran the `CREATE PUBLICATION` and `SELECT pg_create_logical_replication_slot()` commands. Check if TLS certificates need to be included in the connection URI. diff --git a/src/current/_includes/molt/optimize-replicator-performance.md b/src/current/_includes/molt/optimize-replicator-performance.md index e9a6b27dce7..3c6ce6f4aad 100644 --- a/src/current/_includes/molt/optimize-replicator-performance.md +++ b/src/current/_includes/molt/optimize-replicator-performance.md @@ -1,17 +1,17 @@ -Configure the following [`replicator` flags]({% link molt/molt-replicator.md %}#flags) to optimize replication throughput and resource usage. Test different combinations in a pre-production environment to find the optimal balance of stability and performance for your workload. +Configure the following [`replicator` flags]({% link molt/replicator-flags.md %}) to optimize replication throughput and resource usage. Test different combinations in a pre-production environment to find the optimal balance of stability and performance for your workload. {{site.data.alerts.callout_info}} The following parameters apply to PostgreSQL, Oracle, and CockroachDB (failback) sources. {{site.data.alerts.end}} -| Flag | Description | -|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--parallelism` | Control the maximum number of concurrent target transactions. Higher values increase throughput but require more target connections. Start with a conservative value and increase based on target database capacity. | -| `--flushSize` | Balance throughput and latency. Controls how many mutations are batched into each query to the target. Increase for higher throughput at the cost of higher latency. | -| `--targetApplyQueueSize` | Control memory usage during operation. Increase to allow higher throughput at the expense of memory; decrease to apply backpressure and limit memory consumption. | -| `--targetMaxPoolSize` | Set larger than `--parallelism` by a safety factor to avoid exhausting target pool connections. Replicator enforces setting parallelism to 80% of this value. | -| `--collapseMutations` | Reduce the number of queries to the target by combining multiple mutations on the same primary key within each batch. Disable only if exact mutation order matters more than end state. | -| `--enableParallelApplies` | Improve apply throughput for independent tables and table groups that share foreign key dependencies. Increases memory and target connection usage, so ensure you increase `--targetMaxPoolSize` or reduce `--parallelism`. | -| `--flushPeriod` | Set to the maximum allowable time between flushes (for example, `10s` if data must be applied within 10 seconds). Works with `--flushSize` to control when buffered mutations are committed to the target. | -| `--quiescentPeriod` | Lower this value if constraint violations resolve quickly on your workload to make retries more frequent and reduce latency. Do not lower if constraint violations take time to resolve. | -| `--scanSize` | Applies to {% if page.name != "migrate-failback".md" %}[failback]({% link molt/migrate-failback.md %}){% else %}failback{% endif %} (`replicator start`) scenarios **only**. Balance memory usage and throughput. Increase to read more rows at once from the CockroachDB staging cluster for higher throughput, at the cost of memory pressure. Decrease to reduce memory pressure and increase stability. | \ No newline at end of file +| Flag | Description | +|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`--parallelism`]({% link molt/replicator-flags.md %}#parallelism) | Control the maximum number of concurrent target transactions. Higher values increase throughput but require more target connections. Start with a conservative value and increase based on target database capacity. | +| [`--flushSize`]({% link molt/replicator-flags.md %}#flush-size) | Balance throughput and latency. Controls how many mutations are batched into each query to the target. Increase for higher throughput at the cost of higher latency. | +| [`--targetApplyQueueSize`]({% link molt/replicator-flags.md %}#target-apply-queue-size) | Control memory usage during operation. Increase to allow higher throughput at the expense of memory; decrease to apply backpressure and limit memory consumption. | +| [`--targetMaxPoolSize`]({% link molt/replicator-flags.md %}#target-max-pool-size) | Set larger than [`--parallelism`]({% link molt/replicator-flags.md %}#parallelism) by a safety factor to avoid exhausting target pool connections. Replicator enforces setting parallelism to 80% of this value. | +| [`--collapseMutations`]({% link molt/replicator-flags.md %}#collapse-mutations) | Reduce the number of queries to the target by combining multiple mutations on the same primary key within each batch. Disable only if exact mutation order matters more than end state. | +| [`--enableParallelApplies`]({% link molt/replicator-flags.md %}#enable-parallel-applies) | Improve apply throughput for independent tables and table groups that share foreign key dependencies. Increases memory and target connection usage, so ensure you increase [`--targetMaxPoolSize`]({% link molt/replicator-flags.md %}#target-max-pool-size) or reduce [`--parallelism`]({% link molt/replicator-flags.md %}#parallelism). | +| [`--flushPeriod`]({% link molt/replicator-flags.md %}#flush-period) | Set to the maximum allowable time between flushes (for example, `10s` if data must be applied within 10 seconds). Works with [`--flushSize`]({% link molt/replicator-flags.md %}#flush-size) to control when buffered mutations are committed to the target. | +| [`--quiescentPeriod`]({% link molt/replicator-flags.md %}#quiescent-period) | Lower this value if constraint violations resolve quickly on your workload to make retries more frequent and reduce latency. Do not lower if constraint violations take time to resolve. | +| [`--scanSize`]({% link molt/replicator-flags.md %}#scan-size) | Applies to {% if page.name != "migrate-failback".md" %}[failback]({% link molt/migrate-failback.md %}){% else %}failback{% endif %} (`replicator start`) scenarios **only**. Balance memory usage and throughput. Increase to read more rows at once from the CockroachDB staging cluster for higher throughput, at the cost of memory pressure. Decrease to reduce memory pressure and increase stability. | \ No newline at end of file diff --git a/src/current/_includes/molt/replicator-flags-usage.md b/src/current/_includes/molt/replicator-flags-usage.md index 99f93b4ba0b..69322f4e1ca 100644 --- a/src/current/_includes/molt/replicator-flags-usage.md +++ b/src/current/_includes/molt/replicator-flags-usage.md @@ -1,74 +1,77 @@ -The following [MOLT Replicator]({% link molt/molt-replicator.md %}) flags are **required** for continuous replication. For details on all available flags, refer to the [MOLT Replicator documentation]({% link molt/molt-replicator.md %}#flags). +Configure the following [MOLT Replicator]({% link molt/molt-replicator.md %}) flags for continuous replication. For details on all available flags, refer to [Replicator Flags]({% link molt/replicator-flags.md %}). {% if page.name == "migrate-load-replicate.md" %}
-| Flag | Description | -|-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--slotName` | **Required.** PostgreSQL replication slot name. Must match the slot name specified with `--pglogical-replication-slot-name` in the [MOLT Fetch command](#start-fetch). | -| `--targetSchema` | **Required.** Target schema name on CockroachDB where tables will be replicated. | -| `--stagingSchema` | **Required.** Staging schema name for replication metadata and checkpoints. | -| `--stagingCreateSchema` | **Required.** Automatically create the staging schema if it does not exist. Include this flag when starting replication for the first time. | -| `--metricsAddr` | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | +| Flag | Description | +|--------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`--slotName`]({% link molt/replicator-flags.md %}#slot-name) | **Required.** PostgreSQL replication slot name. Must match the slot name specified with `--pglogical-replication-slot-name` in the [MOLT Fetch command](#start-fetch). | +| [`--publicationName`]({% link molt/replicator-flags.md %}#publication-name) | **Required.** PostgreSQL publication name. Must match the publication name created either manually or automatically with `--pglogical-publication-and-slot-drop-and-recreate` in the [MOLT Fetch command](#start-fetch). | +| [`--targetSchema`]({% link molt/replicator-flags.md %}#target-schema) | **Required.** Target schema name on CockroachDB where tables will be replicated. Schema name must be fully qualified in the format `database.schema`. | +| [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) | **Required.** Staging schema name on CockroachDB for replication metadata and checkpoints. Schema name must be fully qualified in the format `database.schema`. | +| [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema) | Automatically create the staging schema if it does not exist. Include this flag when starting replication for the first time. | +| [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. |
| Flag | Description | |-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--targetSchema` | **Required.** Target schema name on CockroachDB where tables will be replicated. | -| `--defaultGTIDSet` | **Required.** Default GTID set for changefeed. | -| `--stagingSchema` | **Required.** Staging schema name for replication metadata and checkpoints. | -| `--stagingCreateSchema` | **Required.** Automatically create the staging schema if it does not exist. Include this flag when starting replication for the first time. | -| `--fetchMetadata` | Explicitly fetch column metadata for MySQL versions that do not support `binlog_row_metadata`. Requires `SELECT` permissions on the source database or `PROCESS` privileges. | -| `--metricsAddr` | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | -| `--userscript` | Path to a userscript that enables table filtering from MySQL sources. Refer to [Table filter userscript](#table-filter-userscript). | +| [`--targetSchema`]({% link molt/replicator-flags.md %}#target-schema) | **Required.** Target schema name on CockroachDB where tables will be replicated. Schema name must be fully qualified in the format `database.schema`. | +| [`--defaultGTIDSet`]({% link molt/replicator-flags.md %}#default-gtid-set) | **Required.** Default GTID set for changefeed. | +| [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) | **Required.** Staging schema name on CockroachDB for replication metadata and checkpoints. Schema name must be fully qualified in the format `database.schema`. | +| [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema) | Automatically create the staging schema if it does not exist. Include this flag when starting replication for the first time. | +| [`--fetchMetadata`]({% link molt/replicator-flags.md %}#fetch-metadata) | Explicitly fetch column metadata for MySQL versions that do not support `binlog_row_metadata`. Requires `SELECT` permissions on the source database or `PROCESS` privileges. | +| [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | +| [`--userscript`]({% link molt/replicator-flags.md %}#userscript) | Path to a userscript that enables table filtering from MySQL sources. Refer to [Table filter userscript](#table-filter-userscript). | You can find the starting GTID in the `cdc_cursor` field of the `fetch complete` message after the [initial data load](#start-fetch) completes.
-| Flag | Description | -|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------| -| `--sourceSchema` | **Required.** Source schema name on Oracle where tables will be replicated from. | -| `--targetSchema` | **Required.** Target schema name on CockroachDB where tables will be replicated. | -| `--scn` | **Required.** Snapshot System Change Number (SCN) for the initial changefeed starting point. | -| `--backfillFromSCN` | **Required.** SCN of the earliest active transaction at the time of the snapshot. Ensures no transactions are skipped. | -| `--stagingSchema` | **Required.** Staging schema name for replication metadata and checkpoints. | -| `--stagingCreateSchema` | **Required.** Automatically create the staging schema if it does not exist. Include this flag when starting replication for the first time. | -| `--metricsAddr` | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | -| `--userscript` | Path to a userscript that enables table filtering from Oracle sources. Refer to [Table filter userscript](#table-filter-userscript). | +| Flag | Description | +|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`--sourceSchema`]({% link molt/replicator-flags.md %}#source-schema) | **Required.** Oracle user that owns the tables to replicate. Oracle capitalizes identifiers by default, so use uppercase (for example, `MIGRATION_USER`). | +| [`--targetSchema`]({% link molt/replicator-flags.md %}#target-schema) | **Required.** Target schema name on CockroachDB where tables will be replicated. Schema name must be fully qualified in the format `database.schema`. | +| [`--scn`]({% link molt/replicator-flags.md %}#scn) | **Required.** Snapshot System Change Number (SCN) for the initial changefeed starting point. | +| [`--backfillFromSCN`]({% link molt/replicator-flags.md %}#backfill-from-scn) | **Required.** SCN of the earliest active transaction at the time of the snapshot. Ensures no transactions are skipped. | +| [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) | **Required.** Staging schema name on CockroachDB for replication metadata and checkpoints. Schema name must be fully qualified in the format `database.schema`. | +| [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema) | Automatically create the staging schema if it does not exist. Include this flag when starting replication for the first time. | +| [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | +| [`--userscript`]({% link molt/replicator-flags.md %}#userscript) | Path to a userscript that enables table filtering from Oracle sources. Refer to [Table filter userscript](#table-filter-userscript). | You can find the SCN values in the message `replication-only mode should include the following replicator flags` after the [initial data load](#start-fetch) completes.
+{% comment %} {% elsif page.name == "migrate-resume-replication.md" %} -| Flag | Description | -|-------------------|----------------------------------------------------------------------------------------------------------------| -| `--stagingSchema` | **Required.** Staging schema name for the changefeed checkpoint table. | -| `--metricsAddr` | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | +| Flag | Description | +|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) | **Required.** Staging schema name on CockroachDB for the changefeed checkpoint table. Schema name must be fully qualified in the format `database.schema`. | +| [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | -The staging schema was created during [initial replication setup]({% link molt/migrate-load-replicate.md %}#start-replicator) with `--stagingCreateSchema`. +The staging schema was created during [initial replication setup]({% link molt/migrate-load-replicate.md %}#start-replicator) with [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema).
{{site.data.alerts.callout_info}} -When using `--table-filter`, you must also include `--userscript`. Refer to [Table filter userscript]({% link molt/migrate-load-replicate.md %}#table-filter-userscript). +When using `--table-filter`, you must also include [`--userscript`]({% link molt/replicator-flags.md %}#userscript). Refer to [Table filter userscript]({% link molt/migrate-load-replicate.md %}#table-filter-userscript). {{site.data.alerts.end}}
+{% endcomment %} {% elsif page.name == "migrate-failback.md" %} -| Flag | Description | -|--------------------|--------------------------------------------------------------------------------------------------------------------------------------| -| `--stagingSchema` | **Required.** Staging schema name for the changefeed checkpoint table. | -| `--bindAddr` | **Required.** Network address to bind the webhook sink for the changefeed. For example, `:30004`. | -| `--tlsCertificate` | Path to the server TLS certificate for the webhook sink. Refer to [TLS certificate and key](#tls-certificate-and-key). | -| `--tlsPrivateKey` | Path to the server TLS private key for the webhook sink. Refer to [TLS certificate and key](#tls-certificate-and-key). | -| `--metricsAddr` | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | +| Flag | Description | +|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) | **Required.** Staging schema name on CockroachDB for the changefeed checkpoint table. Schema name must be fully qualified in the format `database.schema`. | +| [`--bindAddr`]({% link molt/replicator-flags.md %}#bind-addr) | **Required.** Network address to bind the webhook sink for the changefeed. For example, `:30004`. | +| [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) | Path to the server TLS certificate for the webhook sink. Refer to [TLS certificate and key](#tls-certificate-and-key). | +| [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key) | Path to the server TLS private key for the webhook sink. Refer to [TLS certificate and key](#tls-certificate-and-key). | +| [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | -- The staging schema is first created during [initial replication setup]({% link molt/migrate-load-replicate.md %}#start-replicator) with `--stagingCreateSchema`. +- The staging schema is first created during [initial replication setup]({% link molt/migrate-load-replicate.md %}#start-replicator) with [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema). -- When configuring a [secure changefeed](#tls-certificate-and-key) for failback, you **must** include `--tlsCertificate` and `--tlsPrivateKey`, which specify the paths to the server certificate and private key for the webhook sink connection. +- When configuring a [secure changefeed](#tls-certificate-and-key) for failback, you **must** include [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) and [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key), which specify the paths to the server certificate and private key for the webhook sink connection. {% else %} | Flag | Description | |-----------------|----------------------------------------------------------------------------------------------------------------| -| `--metricsAddr` | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | +| [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) | Enable Prometheus metrics at a specified `{host}:{port}`. Metrics are served at `http://{host}:{port}/_/varz`. | {% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/replicator-flags.md b/src/current/_includes/molt/replicator-flags.md deleted file mode 100644 index 4dea042a028..00000000000 --- a/src/current/_includes/molt/replicator-flags.md +++ /dev/null @@ -1,113 +0,0 @@ -### Global flags - -| Flag | Type | Description | -|----------------------------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--applyTimeout` | `DURATION` | The maximum amount of time to wait for an update to be applied.

**Default:** `30s` | -| `--dlqTableName` | `IDENT` | The name of a table in the target schema for storing dead-letter entries.

**Default:** `replicator_dlq` | -| `--enableParallelApplies` | `BOOL` | Enable parallel application of independent table groups during replication. By default, applies are synchronous. When enabled, this increases throughput at the cost of higher target pool usage and memory usage.

**Default:** `false` | -| `--flushPeriod` | `DURATION` | Flush queued mutations after this duration.

**Default:** `1s` | -| `--flushSize` | `INT` | Ideal batch size to determine when to flush mutations.

**Default:** `1000` | -| `--gracePeriod` | `DURATION` | Allow background processes to exit.

**Default:** `30s` | -| `--logDestination` | `STRING` | Write logs to a file. If not specified, write logs to `stdout`. | -| `--logFormat` | `STRING` | Choose log output format: `"fluent"`, `"text"`.

**Default:** `"text"` | -| `--maxRetries` | `INT` | Maximum number of times to retry a failed mutation on the target (for example, due to contention or a temporary unique constraint violation) before treating it as a hard failure.

**Default:** `10` | -| `--metricsAddr` | `STRING` | A `host:port` on which to serve metrics and diagnostics. The metrics endpoint is `http://{host}:{port}/_/varz`. | -| `--parallelism` | `INT` | The number of concurrent database transactions to use.

**Default:** `16` | -| `--quiescentPeriod` | `DURATION` | How often to retry deferred mutations.

**Default:** `10s` | -| `--retireOffset` | `DURATION` | How long to delay removal of applied mutations.

**Default:** `24h0m0s` | -| `--retryInitialBackoff` | `DURATION` | Initial delay before the first retry attempt when applying a mutation to the target database fails due to a retryable error, such as contention or a temporary unique constraint violation.

**Default:** `25ms` | -| `--retryMaxBackoff` | `DURATION` | Maximum delay between retry attempts when applying mutations to the target database fails due to retryable errors.

**Default:** `2s` | -| `--retryMultiplier` | `INT` | Multiplier that controls how quickly the backoff interval increases between successive retries of failed applies to the target database.

**Default:** `2` | -| `--scanSize` | `INT` | The number of rows to retrieve from the staging database used to store metadata for replication.

**Default:** `10000` | -| `--schemaRefresh` | `DURATION` | How often a watcher will refresh its schema. If this value is zero or negative, refresh behavior will be disabled.

**Default:** `1m0s` | -| `--sourceConn` | `STRING` | The source database's connection string. When replicating from Oracle, this is the connection string of the Oracle container database (CDB). Refer to [Oracle replication flags](#oraclelogminer-replication-flags). | -| `--stageDisableCreateTableReaderIndex` | `BOOL` | Disable the creation of partial covering indexes to improve read performance on staging tables. Set to `true` if creating indexes on existing tables would cause a significant operational impact.

**Default:** `false` | -| `--stageMarkAppliedLimit` | `INT` | Limit the number of mutations to be marked applied in a single statement.

**Default:** `100000` | -| `--stageSanityCheckPeriod` | `DURATION` | How often to validate staging table apply order (`-1` to disable).

**Default:** `10m0s` | -| `--stageSanityCheckWindow` | `DURATION` | How far back to look when validating staging table apply order.

**Default:** `1h0m0s` | -| `--stageUnappliedPeriod` | `DURATION` | How often to report the number of unapplied mutations in staging tables (`-1` to disable).

**Default:** `1m0s` | -| `--stagingConn` | `STRING` | The staging database's connection string. | -| `--stagingCreateSchema` | | Automatically create the staging schema if it does not exist. | -| `--stagingIdleTime` | `DURATION` | Maximum lifetime of an idle connection.

**Default:** `1m0s` | -| `--stagingJitterTime` | `DURATION` | The time over which to jitter database pool disconnections.

**Default:** `15s` | -| `--stagingMaxLifetime` | `DURATION` | The maximum lifetime of a database connection.

**Default:** `5m0s` | -| `--stagingMaxPoolSize` | `INT` | The maximum number of staging database connections.

**Default:** `128` | -| `--stagingSchema` | `STRING` | Name of the CockroachDB schema that stores replication metadata. **Required** each time `replicator` is rerun after being interrupted, as the schema contains a checkpoint table that enables replication to resume from the correct transaction.

**Default:** `_replicator.public` | -| `--targetApplyQueueSize` | `INT` | Size of the apply queue that buffers mutations before they are written to the target database. Larger values can improve throughput, but increase memory usage. This flag applies only to CockroachDB and PostgreSQL (`pglogical`) sources, and replaces the deprecated `--copierChannel` and `--stageCopierChannelSize` flags. | -| `--targetConn` | `STRING` | The target database's connection string. | -| `--targetIdleTime` | `DURATION` | Maximum lifetime of an idle connection.

**Default:** `1m0s` | -| `--targetJitterTime` | `DURATION` | The time over which to jitter database pool disconnections.

**Default:** `15s` | -| `--targetMaxLifetime` | `DURATION` | The maximum lifetime of a database connection.

**Default:** `5m0s` | -| `--targetMaxPoolSize` | `INT` | The maximum number of target database connections.

**Default:** `128` | -| `--targetSchema` | `STRING` | The SQL database schema in the target cluster to update. | -| `--targetStatementCacheSize` | `INT` | The maximum number of prepared statements to retain.

**Default:** `128` | -| `--taskGracePeriod` | `DURATION` | How long to allow for task cleanup when recovering from errors.

**Default:** `1m0s` | -| `--timestampLimit` | `INT` | The maximum number of source timestamps to coalesce into a target transaction.

**Default:** `1000` | -| `--userscript` | `STRING` | The path to a TypeScript configuration script. For example, `--userscript 'script.ts'`. | -| `-v`, `--verbose` | `COUNT` | Increase logging verbosity. Use `-v` for `debug` logging or `-vv` for `trace` logging. | - -### `pglogical` replication flags - -The following flags are used when replicating from a [PostgreSQL source database](#source-connection-strings). - -| Flag | Type | Description | -|---------------------|------------|---------------------------------------------------------------------------------| -| `--publicationName` | `STRING` | The publication within the source database to replicate. | -| `--slotName` | `STRING` | The replication slot in the source database.

**Default:** `"replicator"` | -| `--standbyTimeout` | `DURATION` | How often to report WAL progress to the source server.

**Default:** `5s` | - -### `mylogical` replication flags - -The following flags are used when replicating from a [MySQL source database](#source-connection-strings). - -| Flag | Type | Description | -|--------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--defaultGTIDSet` | `STRING` | Default GTID set, in the format `source_uuid:min(interval_start)-max(interval_end)`. **Required** the first time `replicator` is run, as the GTID set provides a replication marker for streaming changes. -| `--fetchMetadata` | | Fetch column metadata explicitly, for older versions of MySQL that do not support `binlog_row_metadata`. | -| `--replicationProcessID` | `UINT32` | The replication process ID to report to the source database.

**Default:** `10` | - -### `oraclelogminer` replication flags - -The following flags are used when replicating from an [Oracle source database](#source-connection-strings). - -| Flag | Type | Description | -|------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--sourceSchema` | `STRING` | **Required.** Source schema name on Oracle where tables will be replicated from. | -| `--scn` | `INT` | The snapshot System Change Number (SCN) from the initial data load. **Required** the first time `replicator` is run, as the SCN provides a replication marker for streaming changes. | -| `--backfillFromSCN` | `INT` | The SCN of the earliest active transaction at the time of the initial snapshot. Ensures no transactions are skipped when starting replication from Oracle. | -| `--sourcePDBConn` | `STRING` | Connection string for the Oracle pluggable database (PDB). Only required when using an [Oracle multitenant configuration](https://docs.oracle.com/en/database/oracle/oracle-database/21/cncpt/CDBs-and-PDBs.html). [`--sourceConn`](#global-flags) **must** be included. | -| `--oracle-application-users` | `STRING` | List of Oracle usernames responsible for DML transactions in the PDB schema. Enables replication from the latest-possible starting point. Usernames are case-sensitive and must match the internal Oracle usernames (e.g., `PDB_USER`). | - -### `start` failback flags - -The following flags are used for failback from CockroachDB. - -| Flag | Type | Description | -|----------------------------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--assumeIdempotent` | | Disable the extra staging table queries that debounce non-idempotent redelivery in changefeeds. | -| `--bestEffortOnly` | | Eventually-consistent mode; useful for high-throughput, skew-tolerant schemas with [foreign keys]({% link {{ site.current_cloud_version }}/foreign-key.md %}). | -| `--bestEffortWindow` | `DURATION` | Use an eventually-consistent mode for initial backfill or when replication is behind; `0` to disable.

**Default:** `1h0m0s` | -| `--bindAddr` | `STRING` | The network address to bind to.

**Default:** `":26258"` | -| `--disableAuthentication` | | Disable authentication of incoming Replicator requests; not recommended for production. | -| `--enableCheckpointStream` | | Enable checkpoint streaming (use an internal changefeed from the staging table for real-time updates), rather than checkpoint polling (query the staging table for periodic updates), for failback replication.

**Default:** `false` (use checkpoint polling) | -| `--discard` | | **Dangerous:** Discard all incoming HTTP requests; useful for changefeed throughput testing. Not intended for production. | -| `--discardDelay` | `DURATION` | Adds additional delay in discard mode; useful for gauging the impact of changefeed round-trip time (RTT). | -| `--healthCheckTimeout` | `DURATION` | The timeout for the health check endpoint.

**Default:** `5s` | -| `--httpResponseTimeout` | `DURATION` | The maximum amount of time to allow an HTTP handler to execute.

**Default:** `2m0s` | -| `--immediate` | | Bypass staging tables and write directly to target; recommended only for KV-style workloads with no [foreign keys]({% link {{ site.current_cloud_version }}/foreign-key.md %}). | -| `--limitLookahead` | `INT` | Limit number of checkpoints to be considered when computing the resolving range; may cause replication to stall completely if older mutations cannot be applied. | -| `--ndjsonBufferSize` | `INT` | The maximum amount of data to buffer while reading a single line of `ndjson` input; increase when source cluster has large blob values.

**Default:** `65536` | -| `--tlsCertificate` | `STRING` | A path to a PEM-encoded TLS certificate chain. | -| `--tlsPrivateKey` | `STRING` | A path to a PEM-encoded TLS private key. | -| `--tlsSelfSigned` | | If true, generate a self-signed TLS certificate valid for `localhost`. | - -### `make-jwt` flags - -The following flags are used with the [`make-jwt` command](#token-quickstart) to generate JWT tokens for changefeed authentication. - -| Flag | Type | Description | -|-----------------|----------|----------------------------------------------------------------------------------| -| `-a`, `--allow` | `STRING` | One or more `database.schema` identifiers. Can be repeated for multiple schemas. | -| `--claim` | | If `true`, print a minimal JWT claim instead of signing. | -| `-k`, `--key` | `STRING` | The path to a PEM-encoded private key to sign the token with. | -| `-o`, `--out` | `STRING` | A file to write the token to. | \ No newline at end of file diff --git a/src/current/_includes/molt/replicator-metrics.md b/src/current/_includes/molt/replicator-metrics.md deleted file mode 100644 index 538e0898e3e..00000000000 --- a/src/current/_includes/molt/replicator-metrics.md +++ /dev/null @@ -1,37 +0,0 @@ -### Replicator metrics - -MOLT Replicator can export [Prometheus](https://prometheus.io/) metrics by setting the `--metricsAddr` flag to a port (for example, `--metricsAddr :30005`). Metrics are not enabled by default. When enabled, metrics are available at the path `/_/varz`. For example: `http://localhost:30005/_/varz`. - -Cockroach Labs recommends monitoring the following metrics during replication: - -{% if page.name == "migrate-failback.md" %} -| Metric Name | Description | -|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------| -| `commit_to_stage_lag_seconds` | Time between when a mutation is written to the source CockroachDB cluster and when it is written to the staging database. | -| `source_commit_to_apply_lag_seconds` | End-to-end lag from when a mutation is written to the source CockroachDB cluster to when it is applied to the target database. | -| `stage_mutations_total` | Number of mutations staged for application to the target database. | -| `apply_conflicts_total` | Number of rows that experienced a compare-and-set (CAS) conflict. | -| `apply_deletes_total` | Number of rows deleted. | -| `apply_duration_seconds` | Length of time it took to successfully apply mutations. | -| `apply_errors_total` | Number of times an error was encountered while applying mutations. | -| `apply_resolves_total` | Number of rows that experienced a compare-and-set (CAS) conflict and which were resolved. | -| `apply_upserts_total` | Number of rows upserted. | -| `target_apply_queue_depth` | Number of batches in the target apply queue. Indicates how backed up the applier flow is between receiving changefeed data and applying it to the target database. | -| `target_apply_queue_utilization_percent` | Utilization percentage (0.0-100.0) of the target apply queue capacity. Use this to understand how close the queue is to capacity and to set alerting thresholds for backpressure conditions. | -| `core_parallelism_utilization_percent` | Current utilization percentage of the applier flow parallelism capacity. Shows what percentage of the configured parallelism is actively being used. | -{% else %} -| Metric Name | Description | -|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------| -| `commit_to_stage_lag_seconds` | Time between when a mutation is written to the source database and when it is written to the staging database. | -| `source_commit_to_apply_lag_seconds` | End-to-end lag from when a mutation is written to the source database to when it is applied to the target CockroachDB. | -| `apply_conflicts_total` | Number of rows that experienced a compare-and-set (CAS) conflict. | -| `apply_deletes_total` | Number of rows deleted. | -| `apply_duration_seconds` | Length of time it took to successfully apply mutations. | -| `apply_errors_total` | Number of times an error was encountered while applying mutations. | -| `apply_resolves_total` | Number of rows that experienced a compare-and-set (CAS) conflict and which were resolved. | -| `apply_upserts_total` | Number of rows upserted. | -{% endif %} - -You can use the [Replicator Grafana dashboard](https://replicator.cockroachdb.com/replicator_grafana_dashboard.json) to visualize the metrics.
For Oracle-specific metrics, import the [Oracle Grafana dashboard](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json).
- -To check MOLT Replicator health when metrics are enabled, run `curl http://localhost:30005/_/healthz` (replacing the port with your `--metricsAddr` value). This returns a status code of `200` if Replicator is running. \ No newline at end of file diff --git a/src/current/_includes/molt/verify-output.md b/src/current/_includes/molt/verify-output.md index c1c72c17a18..b8a04eb0842 100644 --- a/src/current/_includes/molt/verify-output.md +++ b/src/current/_includes/molt/verify-output.md @@ -48,7 +48,7 @@ {"level":"info","time":"2025-02-10T15:35:04-05:00","message":"starting verify on public.employees, shard 1/1"} ~~~ - A `finished row verification` message containing a summary is written after each table is compared. For details on the summary fields, refer to the [MOLT Verify]({% link molt/molt-verify.md %}#usage) page. + A `finished row verification` message is written after each table is compared. If `num_success` equals `num_truth_rows` and the error counters (`num_missing`, `num_mismatch`, `num_extraneous`, and `num_column_mismatch`) are all `0`, the table verified successfully. Any non-zero values in the error counters indicate data discrepancies that need investigation. For details on each field, refer to the [MOLT Verify]({% link molt/molt-verify.md %}#usage) page. ~~~ json {"level":"info","type":"summary","table_schema":"public","table_name":"employees","num_truth_rows":200004,"num_success":200004,"num_conditional_success":0,"num_missing":0,"num_mismatch":0,"num_extraneous":0,"num_live_retry":0,"num_column_mismatch":0,"time":"2025-02-10T15:35:05-05:00","message":"finished row verification on public.employees (shard 1/1)"} diff --git a/src/current/_includes/v23.1/sidebar-data/migrate.json b/src/current/_includes/v23.1/sidebar-data/migrate.json index ce9bbcd78fc..81d046ba2d9 100644 --- a/src/current/_includes/v23.1/sidebar-data/migrate.json +++ b/src/current/_includes/v23.1/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/_includes/v23.2/sidebar-data/migrate.json b/src/current/_includes/v23.2/sidebar-data/migrate.json index ce9bbcd78fc..81d046ba2d9 100644 --- a/src/current/_includes/v23.2/sidebar-data/migrate.json +++ b/src/current/_includes/v23.2/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/_includes/v24.1/sidebar-data/migrate.json b/src/current/_includes/v24.1/sidebar-data/migrate.json index ce9bbcd78fc..81d046ba2d9 100644 --- a/src/current/_includes/v24.1/sidebar-data/migrate.json +++ b/src/current/_includes/v24.1/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/_includes/v24.2/sidebar-data/migrate.json b/src/current/_includes/v24.2/sidebar-data/migrate.json index ce9bbcd78fc..81d046ba2d9 100644 --- a/src/current/_includes/v24.2/sidebar-data/migrate.json +++ b/src/current/_includes/v24.2/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/_includes/v24.3/sidebar-data/migrate.json b/src/current/_includes/v24.3/sidebar-data/migrate.json index ce9bbcd78fc..81d046ba2d9 100644 --- a/src/current/_includes/v24.3/sidebar-data/migrate.json +++ b/src/current/_includes/v24.3/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/_includes/v25.1/sidebar-data/migrate.json b/src/current/_includes/v25.1/sidebar-data/migrate.json index 77b969311fb..e6ba00a899c 100644 --- a/src/current/_includes/v25.1/sidebar-data/migrate.json +++ b/src/current/_includes/v25.1/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/_includes/v25.2/sidebar-data/migrate.json b/src/current/_includes/v25.2/sidebar-data/migrate.json index aa7bb4f6646..7693e764268 100644 --- a/src/current/_includes/v25.2/sidebar-data/migrate.json +++ b/src/current/_includes/v25.2/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/_includes/v25.3/sidebar-data/migrate.json b/src/current/_includes/v25.3/sidebar-data/migrate.json index 77b969311fb..e6ba00a899c 100644 --- a/src/current/_includes/v25.3/sidebar-data/migrate.json +++ b/src/current/_includes/v25.3/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/_includes/v25.4/sidebar-data/migrate.json b/src/current/_includes/v25.4/sidebar-data/migrate.json index 77b969311fb..e6ba00a899c 100644 --- a/src/current/_includes/v25.4/sidebar-data/migrate.json +++ b/src/current/_includes/v25.4/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/_includes/v26.1/sidebar-data/migrate.json b/src/current/_includes/v26.1/sidebar-data/migrate.json index 77b969311fb..e6ba00a899c 100644 --- a/src/current/_includes/v26.1/sidebar-data/migrate.json +++ b/src/current/_includes/v26.1/sidebar-data/migrate.json @@ -60,8 +60,25 @@ }, { "title": "Replicator", - "urls": [ - "/molt/molt-replicator.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-replicator.html" + ] + }, + { + "title": "Flags", + "urls": [ + "/molt/replicator-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/replicator-metrics.html" + ] + } ] }, { diff --git a/src/current/molt/migrate-bulk-load.md b/src/current/molt/migrate-bulk-load.md index 666e5d1ac31..3247c617175 100644 --- a/src/current/molt/migrate-bulk-load.md +++ b/src/current/molt/migrate-bulk-load.md @@ -37,7 +37,6 @@ Perform the bulk load of the source data. molt fetch \ --source $SOURCE \ --target $TARGET \ - --schema-filter 'migration_schema' \ --table-filter 'employees|payments|orders' \ --bucket-path 's3://migration/data/cockroach' \ --table-handling truncate-if-exists \ @@ -68,7 +67,7 @@ Perform the bulk load of the source data. {% include molt/verify-output.md %} -## Modify the CockroachDB schema +## Add constraints and indexes {% include molt/migration-modify-target-schema.md %} diff --git a/src/current/molt/migrate-failback.md b/src/current/molt/migrate-failback.md index b6a39d78d24..a14675fa93d 100644 --- a/src/current/molt/migrate-failback.md +++ b/src/current/molt/migrate-failback.md @@ -57,7 +57,7 @@ SET CLUSTER SETTING kv.rangefeed.concurrent_catchup_iterators = 64; ## Grant target database user permissions -You should have already created a migration user on the target database (your original source database) with the necessary privileges. Refer to [Create migration user on source database]({% link molt/migrate-load-replicate.md %}#create-migration-user-on-source-database). +You should have already created a migration user on the target database (your **original source database**) with the necessary privileges. Refer to [Create migration user on source database]({% link molt/migrate-load-replicate.md %}#create-migration-user-on-source-database). For failback replication, grant the user additional privileges to write data back to the target database: @@ -74,7 +74,7 @@ ALTER DEFAULT PRIVILEGES IN SCHEMA migration_schema GRANT INSERT, UPDATE ON TABL {% include_cached copy-clipboard.html %} ~~~ sql -- Grant INSERT and UPDATE on tables to fail back to -GRANT SELECT, INSERT, UPDATE ON source_database.* TO 'migration_user'@'%'; +GRANT SELECT, INSERT, UPDATE ON migration_db.* TO 'migration_user'@'%'; FLUSH PRIVILEGES; ~~~
@@ -95,12 +95,18 @@ When you run `replicator`, you can configure the following options for replicati - [Connection strings](#connection-strings): Specify URL‑encoded source and target connections. - [TLS certificate and key](#tls-certificate-and-key): Configure secure TLS connections. -- [Replication flags](#replication-flags): Specify required and optional flags to configure replicator behavior. +- [Replicator flags](#replicator-flags): Specify required and optional flags to configure replicator behavior.
- [Tuning parameters](#tuning-parameters): Optimize failback performance and resource usage.
- [Replicator metrics](#replicator-metrics): Monitor failback replication performance. +
+ + + +
+ ### Connection strings For failback, MOLT Replicator uses `--targetConn` to specify the destination database where you want to replicate CockroachDB changes, and `--stagingConn` for the CockroachDB staging database. @@ -171,7 +177,7 @@ WITH ...; For additional details on the webhook sink URI, refer to [Webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink). -### Replication flags +### Replicator flags {% include molt/replicator-flags-usage.md %} @@ -181,7 +187,15 @@ For additional details on the webhook sink URI, refer to [Webhook sink]({% link {% include molt/optimize-replicator-performance.md %}
-{% include molt/replicator-metrics.md %} +### Replicator metrics + +MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: + +~~~ +--metricsAddr :30005 +~~~ + +For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=cockroachdb). ## Stop forward replication @@ -191,14 +205,14 @@ For additional details on the webhook sink URI, refer to [Webhook sink]({% link 1. Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `start` command to begin failback replication from CockroachDB to your source database. In this example, `--metricsAddr :30005` enables a Prometheus endpoint for monitoring replication metrics, and `--bindAddr :30004` sets up the webhook endpoint for the changefeed. - `--stagingSchema` specifies the staging database name (`_replicator` in this example) used for replication checkpoints and metadata. This staging database was created during [initial forward replication]({% link molt/migrate-load-replicate.md %}#start-replicator) when you first ran MOLT Replicator with `--stagingCreateSchema`. + `--stagingSchema` specifies the staging database name (`defaultdb._replicator` in this example) used for replication checkpoints and metadata. This staging database was created during [initial forward replication]({% link molt/migrate-load-replicate.md %}#start-replicator) when you first ran MOLT Replicator with `--stagingCreateSchema`. {% include_cached copy-clipboard.html %} ~~~ shell replicator start \ --targetConn $TARGET \ --stagingConn $STAGING \ - --stagingSchema _replicator \ + --stagingSchema defaultdb._replicator \ --metricsAddr :30005 \ --bindAddr :30004 \ --tlsCertificate ./certs/server.crt \ @@ -223,40 +237,42 @@ Create a CockroachDB changefeed to send changes to MOLT Replicator. 1759246920563173000.0000000000 ~~~ -1. Create the CockroachDB changefeed pointing to the MOLT Replicator webhook endpoint. Use `cursor` to specify the logical timestamp from the preceding step. +1. Create the CockroachDB changefeed pointing to the MOLT Replicator webhook endpoint. Use `cursor` to specify the logical timestamp from the preceding step. For details on the webhook sink URI, refer to [Webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink). {{site.data.alerts.callout_info}} - Ensure that only **one** changefeed points to MOLT Replicator at a time to avoid mixing streams of incoming data. - {{site.data.alerts.end}} - - {{site.data.alerts.callout_success}} - For details on the webhook sink URI, refer to [Webhook sink]({% link {{ site.current_cloud_version }}/changefeed-sinks.md %}#webhook-sink). + Explicitly set a default `10s` [`webhook_client_timeout`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options) value in the `CREATE CHANGEFEED` statement. This value ensures that the webhook can report failures in inconsistent networking situations and make crash loops more visible. {{site.data.alerts.end}}
+ The target schema is specified in the webhook URL path in the fully-qualified format `/database/schema`. The path specifies the database and schema on the target PostgreSQL database. For example, `/migration_db/migration_schema` routes changes to the `migration_schema` schema in the `migration_db` database. + {% include_cached copy-clipboard.html %} ~~~ sql CREATE CHANGEFEED FOR TABLE employees, payments, orders \ - INTO 'webhook-https://replicator-host:30004/migration_schema/public?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ - WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}'; + INTO 'webhook-https://replicator-host:30004/migration_db/migration_schema?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ + WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}', webhook_client_timeout = '10s'; ~~~
+ MySQL tables belong directly to the database, not to a separate schema. The webhook URL path specifies the database name on the target MySQL database. For example, `/migration_db` routes changes to the `migration_db` database. + {% include_cached copy-clipboard.html %} ~~~ sql CREATE CHANGEFEED FOR TABLE employees, payments, orders \ - INTO 'webhook-https://replicator-host:30004/migration_schema?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ - WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}'; + INTO 'webhook-https://replicator-host:30004/migration_db?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ + WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}', webhook_client_timeout = '10s'; ~~~
+ The webhook URL path specifies the schema name on the target Oracle database. Oracle capitalizes identifiers by default. For example, `/MIGRATION_SCHEMA` routes changes to the `MIGRATION_SCHEMA` schema. + {% include_cached copy-clipboard.html %} ~~~ sql CREATE CHANGEFEED FOR TABLE employees, payments, orders \ INTO 'webhook-https://replicator-host:30004/MIGRATION_SCHEMA?client_cert={base64_encoded_cert}&client_key={base64_encoded_key}&ca_cert={base64_encoded_ca}' \ - WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}'; + WITH updated, resolved = '250ms', min_checkpoint_frequency = '250ms', initial_scan = 'no', cursor = '1759246920563173000.0000000000', webhook_sink_config = '{"Flush":{"Bytes":1048576,"Frequency":"1s"}}', webhook_client_timeout = '10s'; ~~~
@@ -268,6 +284,10 @@ Create a CockroachDB changefeed to send changes to MOLT Replicator. 1101234051444375553 ~~~ + {{site.data.alerts.callout_success}} + Ensure that only **one** changefeed points to MOLT Replicator at a time to avoid mixing streams of incoming data. + {{site.data.alerts.end}} + 1. Monitor the changefeed status, specifying the job ID: ~~~ sql diff --git a/src/current/molt/migrate-load-replicate.md b/src/current/molt/migrate-load-replicate.md index d883607d518..f1de67c8175 100644 --- a/src/current/molt/migrate-load-replicate.md +++ b/src/current/molt/migrate-load-replicate.md @@ -40,10 +40,9 @@ Perform the initial load of the source data. molt fetch \ --source $SOURCE \ --target $TARGET \ - --schema-filter 'migration_schema' \ --table-filter 'employees|payments|orders' \ --bucket-path 's3://migration/data/cockroach' \ - --table-handling truncate-if-exists \ + --table-handling truncate-if-exists ~~~
@@ -59,7 +58,7 @@ Perform the initial load of the source data. --schema-filter 'migration_schema' \ --table-filter 'employees|payments|orders' \ --bucket-path 's3://migration/data/cockroach' \ - --table-handling truncate-if-exists \ + --table-handling truncate-if-exists ~~~
@@ -76,12 +75,18 @@ Use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the source and When you run `replicator`, you can configure the following options for replication: - [Replication connection strings](#replication-connection-strings): Specify URL-encoded source and target database connections. -- [Replication flags](#replication-flags): Specify required and optional flags to configure replicator behavior. +- [Replicator flags](#replicator-flags): Specify required and optional flags to configure replicator behavior.
- [Tuning parameters](#tuning-parameters): Optimize replication performance and resource usage.
- [Replicator metrics](#replicator-metrics): Monitor replication progress and performance. +
+ + + +
+ ### Replication connection strings MOLT Replicator uses `--sourceConn` and `--targetConn` to specify the source and target database connections. @@ -122,7 +127,7 @@ For Oracle Multitenant databases, also specify `--sourcePDBConn` with the PDB co Follow best practices for securing connection strings. Refer to [Secure connections](#secure-connections). {{site.data.alerts.end}} -### Replication flags +### Replicator flags {% include molt/replicator-flags-usage.md %} @@ -132,7 +137,25 @@ Follow best practices for securing connection strings. Refer to [Secure connecti {% include molt/optimize-replicator-performance.md %}
-{% include molt/replicator-metrics.md %} +### Replicator metrics + +MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: + +~~~ +--metricsAddr :30005 +~~~ + +
+For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=postgres). +
+ +
+For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=mysql). +
+ +
+For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}?filters=oracle). +
## Start Replicator @@ -143,16 +166,17 @@ MOLT Fetch captures a consistent point-in-time checkpoint at the start of the da {{site.data.alerts.end}}
-1. Run the `replicator` command, using the same slot name that you specified with `--pglogical-replication-slot-name` in the [Fetch command](#start-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database: +1. Run the `replicator` command, using the same slot name that you specified with `--pglogical-replication-slot-name` and the publication name created by `--pglogical-publication-and-slot-drop-and-recreate` in the [Fetch command](#start-fetch). Use `--stagingSchema` to specify a unique name for the staging database, and include `--stagingCreateSchema` to have MOLT Replicator automatically create the staging database: {% include_cached copy-clipboard.html %} ~~~ shell replicator pglogical \ --sourceConn $SOURCE \ --targetConn $TARGET \ - --targetSchema defaultdb.public \ + --targetSchema defaultdb.migration_schema \ --slotName molt_slot \ - --stagingSchema _replicator \ + --publicationName molt_fetch \ + --stagingSchema defaultdb._replicator \ --stagingCreateSchema \ --metricsAddr :30005 \ -v @@ -169,7 +193,7 @@ MOLT Fetch captures a consistent point-in-time checkpoint at the start of the da --targetConn $TARGET \ --targetSchema defaultdb.public \ --defaultGTIDSet 4c658ae6-e8ad-11ef-8449-0242ac140006:1-29 \ - --stagingSchema _replicator \ + --stagingSchema defaultdb._replicator \ --stagingCreateSchema \ --metricsAddr :30005 \ --userscript table_filter.ts \ @@ -177,7 +201,7 @@ MOLT Fetch captures a consistent point-in-time checkpoint at the start of the da ~~~ {{site.data.alerts.callout_success}} - For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON source_database.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. + For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON migration_db.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. {{site.data.alerts.end}}
@@ -190,11 +214,11 @@ MOLT Fetch captures a consistent point-in-time checkpoint at the start of the da --sourceConn $SOURCE \ --sourcePDBConn $SOURCE_PDB \ --targetConn $TARGET \ - --sourceSchema migration_schema \ - --targetSchema defaultdb.public \ + --sourceSchema MIGRATION_USER \ + --targetSchema defaultdb.migration_schema \ --backfillFromSCN 26685444 \ --scn 26685786 \ - --stagingSchema _replicator \ + --stagingSchema defaultdb._replicator \ --stagingCreateSchema \ --metricsAddr :30005 \ --userscript table_filter.ts \ @@ -267,7 +291,7 @@ MOLT Fetch captures a consistent point-in-time checkpoint at the start of the da 1. Repeat [Verify the data load](#verify-the-data-load) to verify the updated data. -## Modify the CockroachDB schema +## Add constraints and indexes {% include molt/migration-modify-target-schema.md %} diff --git a/src/current/molt/migrate-resume-replication.md b/src/current/molt/migrate-resume-replication.md index 2ccd0b68cf3..270ea160e04 100644 --- a/src/current/molt/migrate-resume-replication.md +++ b/src/current/molt/migrate-resume-replication.md @@ -29,9 +29,9 @@ Be sure to specify the same `--slotName` value that you used during your [initia replicator pglogical \ --sourceConn $SOURCE \ --targetConn $TARGET \ ---targetSchema defaultdb.public \ +--targetSchema defaultdb.migration_schema \ --slotName molt_slot \ ---stagingSchema _replicator \ +--stagingSchema defaultdb._replicator \ --metricsAddr :30005 \ -v ~~~ @@ -40,10 +40,10 @@ replicator pglogical \
Run the [MOLT Replicator]({% link molt/molt-replicator.md %}) `mylogical` command using the same `--stagingSchema` value from your [initial replication command]({% link molt/migrate-load-replicate.md %}?filters=mysql#start-replicator). -Replicator will automatically use the saved GTID (Global Transaction Identifier) from the `memo` table in the staging schema (in this example, `_replicator.memo`) and track advancing GTID checkpoints there. To have Replicator start from a different GTID instead of resuming from the checkpoint, clear the `memo` table with `DELETE FROM _replicator.memo;` and run the `replicator` command with a new `--defaultGTIDSet` value. +Replicator will automatically use the saved GTID (Global Transaction Identifier) from the `memo` table in the staging schema (in this example, `defaultdb._replicator.memo`) and track advancing GTID checkpoints there. To have Replicator start from a different GTID instead of resuming from the checkpoint, clear the `memo` table with `DELETE FROM defaultdb._replicator.memo;` and run the `replicator` command with a new `--defaultGTIDSet` value. {{site.data.alerts.callout_success}} -For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON source_database.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. +For MySQL versions that do not support `binlog_row_metadata`, include `--fetchMetadata` to explicitly fetch column metadata. This requires additional permissions on the source MySQL database. Grant `SELECT` permissions with `GRANT SELECT ON migration_db.* TO 'migration_user'@'localhost';`. If that is insufficient for your deployment, use `GRANT PROCESS ON *.* TO 'migration_user'@'localhost';`, though this is more permissive and allows seeing processes and server status. {{site.data.alerts.end}} {% include_cached copy-clipboard.html %} @@ -52,7 +52,7 @@ replicator mylogical \ --sourceConn $SOURCE \ --targetConn $TARGET \ --targetSchema defaultdb.public \ ---stagingSchema _replicator \ +--stagingSchema defaultdb._replicator \ --metricsAddr :30005 \ --userscript table_filter.ts \ -v @@ -69,11 +69,13 @@ Replicator will automatically find the correct restart SCN (System Change Number replicator oraclelogminer \ --sourceConn $SOURCE \ --sourcePDBConn $SOURCE_PDB \ ---sourceSchema migration_schema \ +--sourceSchema MIGRATION_USER \ +--targetSchema defaultdb.migration_schema \ --targetConn $TARGET \ ---stagingSchema _replicator \ +--stagingSchema defaultdb._replicator \ --metricsAddr :30005 \ ---userscript table_filter.ts +--userscript table_filter.ts \ +-v ~~~ {{site.data.alerts.callout_info}} diff --git a/src/current/molt/molt-fetch.md b/src/current/molt/molt-fetch.md index 02052c16583..99a407fd058 100644 --- a/src/current/molt/molt-fetch.md +++ b/src/current/molt/molt-fetch.md @@ -88,54 +88,56 @@ MOLT Fetch loads the exported data into the target CockroachDB database. The pro ### Global flags -| Flag | Description | -|---------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--source` | (Required) Connection string used to connect to the Oracle PDB (in a CDB/PDB architecture) or to a standalone database (non‑CDB). For details, refer to [Source and target databases](#source-and-target-databases). | -| `--source-cdb` | Connection string for the Oracle container database (CDB) when using a multitenant (CDB/PDB) architecture. Omit this flag on a non‑multitenant Oracle database. For details, refer to [Source and target databases](#source-and-target-databases). | -| `--target` | (Required) Connection string for the target database. For details, refer to [Source and target databases](#source-and-target-databases). | -| `--allow-tls-mode-disable` | Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled **only** if secure SSL/TLS connections to the source or target database are not possible. | -| `--assume-role` | Service account to use for assume role authentication. `--use-implicit-auth` must be included. For example, `--assume-role='user-test@cluster-ephemeral.iam.gserviceaccount.com' --use-implicit-auth`. For details, refer to [Cloud Storage Authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}). | -| `--bucket-path` | The path within the [cloud storage](#bucket-path) bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the URL path is used; query parameters (e.g., credentials) are ignored. To pass in query parameters, use the appropriate flags: `--assume-role`, `--import-region`, `--use-implicit-auth`. | -| `--case-sensitive` | Toggle case sensitivity when comparing table and column names on the source and target. To disable case sensitivity, set `--case-sensitive=false`. If `=` is **not** included (e.g., `--case-sensitive false`), the flag is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`).

**Default:** `false` | -| `--cleanup` | Whether to delete intermediate files after moving data using [cloud or local storage](#data-path). **Note:** Cleanup does not occur on [continuation](#fetch-continuation). | -| `--compression` | Compression method for data when using [`IMPORT INTO`](#data-load-mode) (`gzip`/`none`).

**Default:** `gzip` | -| `--continuation-file-name` | Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | -| `--continuation-token` | Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | -| `--crdb-pts-duration` | The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`.

**Default:** `24h0m0s` | -| `--crdb-pts-refresh-interval` | The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete.

**Default:** `10m0s` | -| `--direct-copy` | Enables [direct copy](#direct-copy), which copies data directly from source to target without using an intermediate store. | -| `--export-concurrency` | Number of shards to export at a time per table, each on a dedicated thread. This controls how many shards are created for each individual table during the [data export phase](#data-export-phase) and is distinct from `--table-concurrency`, which controls how many tables are processed simultaneously. The total number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Tables can be sharded with a range-based or stats-based mechanism. For details, refer to [Table sharding](#table-sharding).

**Default:** `4` | -| `--export-retry-max-attempts` | Maximum number of retry attempts for source export queries when connection failures occur. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `3` | -| `--export-retry-max-duration` | Maximum total duration for retrying source export queries. If `0`, no time limit is enforced. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `5m0s` | -| `--filter-path` | Path to a JSON file defining row-level filters for the [data import phase](#data-import-phase). Refer to [Selective data movement](#selective-data-movement). | -| `--fetch-id` | Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables. | -| `--flush-rows` | Number of rows before the source data is flushed to intermediate files. **Note:** If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first. | -| `--flush-size` | Size (in bytes) before the source data is flushed to intermediate files. **Note:** If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first. | -| `--ignore-replication-check` | Skip querying for replication checkpoints such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle. This option is intended for use during bulk load migrations or when doing a one-time data export from a read replica. | -| `--import-batch-size` | The number of files to be imported at a time to the target database during the [data import phase](#data-import-phase). This applies only when using [`IMPORT INTO`](#data-load-mode) for data movement. **Note:** Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must [retry](#fetch-continuation) the entire batch.

**Default:** `1000` | -| `--import-region` | The region of the [cloud storage](#bucket-path) bucket. This applies only to [Amazon S3 buckets](#bucket-path). Set this flag only if you need to specify an `AWS_REGION` explicitly when using [`IMPORT INTO`](#data-load-mode) for data movement. For example, `--import-region=ap-south-1`. | -| `--local-path` | The path within the [local file server](#local-path) where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified. | -| `--local-path-crdb-access-addr` | Address of a [local file server](#local-path) that is **publicly accessible**. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.

**Default:** Value of `--local-path-listen-addr`. | -| `--local-path-listen-addr` | Write intermediate files to a [local file server](#local-path) at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified. | -| `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. | -| `--logging` | Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`).

**Default:** `info` | -| `--metrics-listen-addr` | Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, refer to [Monitoring](#monitoring).

**Default:** `'127.0.0.1:3030'` | -| `--mode` | Configure the MOLT Fetch behavior: `data-load`, `export-only`, or `import-only`. For details, refer to [Fetch mode](#fetch-mode).

**Default:** `data-load` | -| `--non-interactive` | Run the fetch task without interactive prompts. This is recommended **only** when running `molt fetch` in an automated process (i.e., a job or continuous integration). | -| `--pprof-listen-addr` | Address of the pprof endpoint.

**Default:** `'127.0.0.1:3031'` | -| `--row-batch-size` | Number of rows per shard to export at a time. For details on sharding, refer to [Table sharding](#table-sharding). See also [Best practices](#best-practices).

**Default:** `100000` | -| `--schema-filter` | Move schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | -| `--skip-pk-check` | Skip primary-key matching to allow data load when source or target tables have missing or mismatched primary keys. Disables sharding and bypasses `--export-concurrency` and `--row-batch-size` settings. Refer to [Skip primary key matching](#skip-primary-key-matching).

**Default:** `false` | -| `--table-concurrency` | Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`.

**Default:** `4` | -| `--table-exclusion-filter` | Exclude tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

This value **cannot** be set to `'.*'`, which would cause every table to be excluded.

**Default:** Empty string | -| `--table-filter` | Move tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | -| `--table-handling` | How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see [Target table handling](#target-table-handling).

**Default:** `none` | -| `--transformations-file` | Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to [Transformations](#transformations). | -| `--type-map-file` | Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see [type mapping](#type-mapping). | -| `--use-console-writer` | Use the console writer, which has cleaner log output but introduces more latency.

**Default:** `false` (log as structured JSON) | -| `--use-copy` | Use [`COPY FROM`](#data-load-mode) to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to [Data movement](#data-load-mode). | -| `--use-implicit-auth` | Use [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) for [cloud storage](#bucket-path) URIs. | -| `--use-stats-based-sharding` | Enable statistics-based sharding for PostgreSQL sources. This allows sharding of tables with primary keys of any data type and can create more evenly distributed shards compared to the default numerical range sharding. Requires PostgreSQL 11+ and access to `pg_stats`. For details, refer to [Table sharding](#table-sharding). | +| Flag | Description | +|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `--source` | (Required) Connection string used to connect to the Oracle PDB (in a CDB/PDB architecture) or to a standalone database (non‑CDB). For details, refer to [Source and target databases](#source-and-target-databases). | +| `--source-cdb` | Connection string for the Oracle container database (CDB) when using a multitenant (CDB/PDB) architecture. Omit this flag on a non‑multitenant Oracle database. For details, refer to [Source and target databases](#source-and-target-databases). | +| `--target` | (Required) Connection string for the target database. For details, refer to [Source and target databases](#source-and-target-databases). | +| `--allow-tls-mode-disable` | Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled **only** if secure SSL/TLS connections to the source or target database are not possible. | +| `--assume-role` | Service account to use for assume role authentication. `--use-implicit-auth` must be included. For example, `--assume-role='user-test@cluster-ephemeral.iam.gserviceaccount.com' --use-implicit-auth`. For details, refer to [Cloud Storage Authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}). | +| `--bucket-path` | The path within the [cloud storage](#bucket-path) bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the URL path is used; query parameters (e.g., credentials) are ignored. To pass in query parameters, use the appropriate flags: `--assume-role`, `--import-region`, `--use-implicit-auth`. | +| `--case-sensitive` | Toggle case sensitivity when comparing table and column names on the source and target. To disable case sensitivity, set `--case-sensitive=false`. If `=` is **not** included (e.g., `--case-sensitive false`), the flag is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`).

**Default:** `false` | +| `--cleanup` | Whether to delete intermediate files after moving data using [cloud or local storage](#data-path). **Note:** Cleanup does not occur on [continuation](#fetch-continuation). | +| `--compression` | Compression method for data when using [`IMPORT INTO`](#data-load-mode) (`gzip`/`none`).

**Default:** `gzip` | +| `--continuation-file-name` | Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | +| `--continuation-token` | Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | +| `--crdb-pts-duration` | The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`.

**Default:** `24h0m0s` | +| `--crdb-pts-refresh-interval` | The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete.

**Default:** `10m0s` | +| `--direct-copy` | Enables [direct copy](#direct-copy), which copies data directly from source to target without using an intermediate store. | +| `--export-concurrency` | Number of shards to export at a time per table, each on a dedicated thread. This controls how many shards are created for each individual table during the [data export phase](#data-export-phase) and is distinct from `--table-concurrency`, which controls how many tables are processed simultaneously. The total number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Tables can be sharded with a range-based or stats-based mechanism. For details, refer to [Table sharding](#table-sharding).

**Default:** `4` | +| `--export-retry-max-attempts` | Maximum number of retry attempts for source export queries when connection failures occur. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `3` | +| `--export-retry-max-duration` | Maximum total duration for retrying source export queries. If `0`, no time limit is enforced. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `5m0s` | +| `--filter-path` | Path to a JSON file defining row-level filters for the [data import phase](#data-import-phase). Refer to [Selective data movement](#selective-data-movement). | +| `--fetch-id` | Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables. | +| `--flush-rows` | Number of rows before the source data is flushed to intermediate files. **Note:** If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first. | +| `--flush-size` | Size (in bytes) before the source data is flushed to intermediate files. **Note:** If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first. | +| `--ignore-replication-check` | Skip querying for replication checkpoints such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle. This option is intended for use during bulk load migrations or when doing a one-time data export from a read replica. | +| `--import-batch-size` | The number of files to be imported at a time to the target database during the [data import phase](#data-import-phase). This applies only when using [`IMPORT INTO`](#data-load-mode) for data movement. **Note:** Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must [retry](#fetch-continuation) the entire batch.

**Default:** `1000` | +| `--import-region` | The region of the [cloud storage](#bucket-path) bucket. This applies only to [Amazon S3 buckets](#bucket-path). Set this flag only if you need to specify an `AWS_REGION` explicitly when using [`IMPORT INTO`](#data-load-mode) for data movement. For example, `--import-region=ap-south-1`. | +| `--local-path` | The path within the [local file server](#local-path) where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified. | +| `--local-path-crdb-access-addr` | Address of a [local file server](#local-path) that is **publicly accessible**. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.

**Default:** Value of `--local-path-listen-addr`. | +| `--local-path-listen-addr` | Write intermediate files to a [local file server](#local-path) at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified. | +| `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. | +| `--logging` | Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`).

**Default:** `info` | +| `--metrics-listen-addr` | Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, refer to [Monitoring](#monitoring).

**Default:** `'127.0.0.1:3030'` | +| `--mode` | Configure the MOLT Fetch behavior: `data-load`, `export-only`, or `import-only`. For details, refer to [Fetch mode](#fetch-mode).

**Default:** `data-load` | +| `--non-interactive` | Run the fetch task without interactive prompts. This is recommended **only** when running `molt fetch` in an automated process (i.e., a job or continuous integration). | +| `--pglogical-replication-slot-name` | Name of a replication slot that will be created before taking a snapshot of data. Must match the slot name specified with `--slotName` in the [MOLT Replicator command]({% link molt/molt-replicator.md %}#replication-checkpoints). For details, refer to [Load before replication](#load-before-replication). | +| `--pglogical-publication-and-slot-drop-and-recreate` | Drop the publication and replication slot if they exist, then recreate them. Creates a publication named `molt_fetch` and the replication slot specified with `--pglogical-replication-slot-name`. For details, refer to [Load before replication](#load-before-replication).

**Default:** `false` | +| `--pprof-listen-addr` | Address of the pprof endpoint.

**Default:** `'127.0.0.1:3031'` | +| `--row-batch-size` | Number of rows per shard to export at a time. For details on sharding, refer to [Table sharding](#table-sharding). See also [Best practices](#best-practices).

**Default:** `100000` | +| `--schema-filter` | Move schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression). Not used with MySQL sources. For Oracle sources, this filter is case-insensitive.

**Default:** `'.*'` | +| `--skip-pk-check` | Skip primary-key matching to allow data load when source or target tables have missing or mismatched primary keys. Disables sharding and bypasses `--export-concurrency` and `--row-batch-size` settings. Refer to [Skip primary key matching](#skip-primary-key-matching).

**Default:** `false` | +| `--table-concurrency` | Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`.

**Default:** `4` | +| `--table-exclusion-filter` | Exclude tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

This value **cannot** be set to `'.*'`, which would cause every table to be excluded.

**Default:** Empty string | +| `--table-filter` | Move tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | +| `--table-handling` | How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see [Target table handling](#target-table-handling).

**Default:** `none` | +| `--transformations-file` | Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to [Transformations](#transformations). | +| `--type-map-file` | Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see [type mapping](#type-mapping). | +| `--use-console-writer` | Use the console writer, which has cleaner log output but introduces more latency.

**Default:** `false` (log as structured JSON) | +| `--use-copy` | Use [`COPY FROM`](#data-load-mode) to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to [Data load mode](#data-load-mode). | +| `--use-implicit-auth` | Use [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) for [cloud storage](#bucket-path) URIs. | +| `--use-stats-based-sharding` | Enable statistics-based sharding for PostgreSQL sources. This allows sharding of tables with primary keys of any data type and can create more evenly distributed shards compared to the default numerical range sharding. Requires PostgreSQL 11+ and access to `pg_stats`. For details, refer to [Table sharding](#table-sharding). | ### `tokens list` flags @@ -383,13 +385,17 @@ For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, By default, MOLT Fetch moves all data from the [`--source`](#source-and-target-databases) database to CockroachDB. Use the following flags to move a subset of data. -`--schema-filter` specifies a range of schema objects to move to CockroachDB, formatted as a POSIX regex string. For example, to move every table in the source database's `public` schema: +`--schema-filter` specifies a range of schema objects to move to CockroachDB, formatted as a POSIX regex string. For example, to move every table in the source database's `migration_schema` schema: {% include_cached copy-clipboard.html %} ~~~ ---schema-filter 'public' +--schema-filter 'migration_schema' ~~~ +{{site.data.alerts.callout_info}} +`--schema-filter` does not apply to MySQL sources because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. +{{site.data.alerts.end}} + `--table-filter` and `--table-exclusion-filter` specify tables to include and exclude from the migration, respectively, formatted as POSIX regex strings. For example, to move every source table that has "user" in the table name and exclude every source table that has "temp" in the table name: {% include_cached copy-clipboard.html %} @@ -406,14 +412,14 @@ Use `--filter-path` to specify the path to a JSON file that defines row-level fi --filter-path 'data-filter.json' ~~~ -The JSON file should contain one or more entries in `filters`, each with a `resource_specifier` (`schema` and `table`) and a SQL expression `expr`. For example, the following example exports only rows from `public.t1` where `v > 100`: +The JSON file should contain one or more entries in `filters`, each with a `resource_specifier` (`schema` and `table`) and a SQL expression `expr`. For example, the following example exports only rows from `migration_schema.t1` where `v > 100`: ~~~ json { "filters": [ { "resource_specifier": { - "schema": "public", + "schema": "migration_schema", "table": "t1" }, "expr": "v > 100" @@ -445,18 +451,18 @@ If the expression references columns that are not indexed, MOLT Fetch will emit {% comment %} #### `--filter-path` userscript for replication -To use `--filter-path` with replication, create and save a TypeScript userscript (e.g., `filter-script.ts`). The following script ensures that only rows where `v > 100` are replicated to `defaultdb.public.t1`: +To use `--filter-path` with replication, create and save a TypeScript userscript (e.g., `filter-script.ts`). The following script ensures that only rows where `v > 100` are replicated to `defaultdb.migration_schema.t1`: {% include_cached copy-clipboard.html %} ~~~ ts import * as api from "replicator@v1"; function disp(doc, meta) { if (Number(doc.v) > 100) { - return { "defaultdb.public.t1" : [ doc ] }; + return { "defaultdb.migration_schema.t1" : [ doc ] }; } } // Always put target schema. -api.configureSource("defaultdb.public", { +api.configureSource("defaultdb.migration_schema", { deletesTo: disp, dispatch: disp, }); @@ -637,11 +643,12 @@ The following JSON example defines two type mappings: ### Transformations -You can define transformation rules to be performed on the target schema during the fetch task. These can be used to: +You can define transformation rules to be performed on the target database during the fetch task. These can be used to: -- Map [computed columns]({% link {{ site.current_cloud_version }}/computed-columns.md %}) to a target schema. +- Map [computed columns]({% link {{ site.current_cloud_version }}/computed-columns.md %}) from source to target. - Map [partitioned tables]({% link {{ site.current_cloud_version }}/partitioning.md %}) to a single target table. -- Rename tables on the target schema. +- Rename tables on the target database. +- Rename database schemas. Transformation rules are defined in the JSON file indicated by the `--transformations-file` flag. For example: @@ -650,7 +657,9 @@ Transformation rules are defined in the JSON file indicated by the `--transforma --transformations-file 'transformation-rules.json' ~~~ -The following JSON example defines two transformation rules: +#### Transformation rules example + +The following JSON example defines three transformation rules: rule `1` [maps computed columns](#column-exclusions-and-computed-columns), rule `2` [renames tables](#table-renaming), and rule `3` [renames schemas](#schema-renaming). ~~~ json { @@ -675,32 +684,99 @@ The following JSON example defines two transformation rules: "table_rename_opts": { "value": "charges" } + }, + { + "id": 3, + "resource_specifier": { + "schema": "previous_schema" + }, + "schema_rename_opts": { + "value": "new_schema" + } } ] } ~~~ -- `resource_specifier` configures the following options for transformation rules: - - `schema` specifies the schemas to be affected by the transformation rule, formatted as a POSIX regex string. - - `table` specifies the tables to be affected by the transformation rule, formatted as a POSIX regex string. -- `column_exclusion_opts` configures the following options for column exclusions and computed columns: - - `column` specifies source columns to exclude from being mapped to regular columns on the target schema. It is formatted as a POSIX regex string. - - `add_computed_def`, when set to `true`, specifies that each matching `column` should be mapped to a [computed column]({% link {{ site.current_cloud_version }}/computed-columns.md %}) on the target schema. Instead of being moved from the source, the column data is generated on the target using [`ALTER TABLE ... ADD COLUMN`]({% link {{ site.current_cloud_version }}/alter-table.md %}#add-column) and the computed column definition from the source schema. This assumes that all matching columns are computed columns on the source. +#### Column exclusions and computed columns + +- `resource_specifier`: Identifies which schemas and tables to transform. + - `schema`: POSIX regex matching source schemas. + - `table`: POSIX regex matching source tables. +- `column_exclusion_opts`: Exclude columns or map them as computed columns. + - `column`: POSIX regex matching source columns to exclude. + - `add_computed_def`: When `true`, map matching columns as [computed columns]({% link {{ site.current_cloud_version }}/computed-columns.md %}) on target tables using [`ALTER TABLE ... ADD COLUMN`]({% link {{ site.current_cloud_version }}/alter-table.md %}#add-column) and the source column definition. All matching columns must be computed columns on the source. {{site.data.alerts.callout_danger}} - Columns that match the `column` regex will **not** be moved to CockroachDB if `add_computed_def` is omitted or set to `false` (default), or if a matching column is a non-computed column. + Columns matching `column` are **not** moved to CockroachDB if `add_computed_def` is `false` (default) or if matching columns are not computed columns. {{site.data.alerts.end}} -- `table_rename_opts` configures the following option for table renaming: - - `value` specifies the table name to which the matching `resource_specifier` is mapped. If only one source table matches `resource_specifier`, it is renamed to `table_rename_opts.value` on the target. If more than one table matches `resource_specifier` (i.e., an n-to-1 mapping), the fetch task assumes that all matching tables are [partitioned tables]({% link {{ site.current_cloud_version }}/partitioning.md %}) with the same schema, and moves their data to a table named `table_rename_opts.value` on the target. Otherwise, the task will error. - - Additionally, in an n-to-1 mapping situation: - - Specify [`--use-copy`](#data-load-mode) or [`--direct-copy`](#direct-copy) for data movement. This is because the data from the source tables is loaded concurrently into the target table. - - Create the target table schema manually, and do **not** use [`--table-handling drop-on-target-and-recreate`](#target-table-handling) for target table handling. +[Example rule `1`](#transformation-rules-example) maps all source `age` columns to [computed columns]({% link {{ site.current_cloud_version }}/computed-columns.md %}) on CockroachDB. This assumes that all matching `age` columns are defined as computed columns on the source: -The preceding JSON example therefore defines two rules: +~~~ json +{ + "id": 1, + "resource_specifier": { + "schema": ".*", + "table": ".*" + }, + "column_exclusion_opts": { + "add_computed_def": true, + "column": "^age$" + } +}, +~~~ + +#### Table renaming + +- `resource_specifier`: Identifies which schemas and tables to transform. + - `schema`: POSIX regex matching source schemas. + - `table`: POSIX regex matching source tables. +- `table_rename_opts`: Rename tables on the target. + - `value`: Target table name. For a single matching source table, renames it to this value. For multiple matches (n-to-1), consolidates matching [partitioned tables]({% link {{ site.current_cloud_version }}/partitioning.md %}) with the same table definition into a single table with this name. -- Rule `1` maps all source `age` columns on the source database to [computed columns]({% link {{ site.current_cloud_version }}/computed-columns.md %}) on CockroachDB. This assumes that all matching `age` columns are defined as computed columns on the source. -- Rule `2` maps all table names with prefix `charges_part` from the source database to a single `charges` table on CockroachDB (i.e., an n-to-1 mapping). This assumes that all matching `charges_part.*` tables have the same schema. + For n-to-1 mappings: + + - Use [`--use-copy`](#data-load-mode) or [`--direct-copy`](#direct-copy) for data movement. + - Manually create the target table. Do not use [`--table-handling drop-on-target-and-recreate`](#target-table-handling). + +[Example rule `2`](#transformation-rules-example) maps all table names with prefix `charges_part` to a single `charges` table on CockroachDB (an n-to-1 mapping). This assumes that all matching `charges_part.*` tables have the same table definition: + +~~~ json +{ + "id": 2, + "resource_specifier": { + "schema": "public", + "table": "charges_part.*" + }, + "table_rename_opts": { + "value": "charges" + } +}, +~~~ + +#### Schema renaming + +- `resource_specifier`: Identifies which schemas and tables to transform. + - `schema`: POSIX regex matching source schemas. + - `table`: POSIX regex matching source tables. +- `schema_rename_opts`: Rename database schemas on the target. + - `value`: Target schema name. For example, `previous_schema.table1` becomes `new_schema.table1`. + +[Example rule `3`](#transformation-rules-example) renames the database schema `previous_schema` to `new_schema` on CockroachDB: + +~~~ json +{ + "id": 3, + "resource_specifier": { + "schema": "previous_schema" + }, + "schema_rename_opts": { + "value": "new_schema" + } +} +~~~ + +#### General notes Each rule is applied in the order it is defined. If two rules overlap, the later rule will override the earlier rule. @@ -793,12 +869,20 @@ Continuation Tokens. ### CDC cursor -A change data capture (CDC) cursor is written to the output as `cdc_cursor` at the beginning and end of the fetch task. For example: +A change data capture (CDC) cursor is written to the output as `cdc_cursor` at the beginning and end of the fetch task. + +For MySQL: ~~~ json {"level":"info","type":"summary","fetch_id":"735a4fe0-c478-4de7-a342-cfa9738783dc","num_tables":1,"tables":["public.employees"],"cdc_cursor":"b7f9e0fa-2753-1e1f-5d9b-2402ac810003:3-21","net_duration_ms":4879.890041,"net_duration":"000h 00m 04s","time":"2024-03-18T12:37:02-04:00","message":"fetch complete"} ~~~ +For Oracle: + +~~~ json +{"level":"info","type":"summary","fetch_id":"735a4fe0-c478-4de7-a342-cfa9738783dc","num_tables":3,"tables":["migration_schema.employees"],"cdc_cursor":"backfillFromSCN=26685444,scn=26685786","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:37:02-04:00","message":"fetch complete"} +~~~ + Use the `cdc_cursor` value as the checkpoint for MySQL or Oracle replication with [MOLT Replicator]({% link molt/molt-replicator.md %}#replication-checkpoints). You can also use the `cdc_cursor` value with an external change data capture (CDC) tool to continuously replicate subsequent changes from the source database to CockroachDB. @@ -823,15 +907,34 @@ By default, insecure connections (i.e., `sslmode=disable` on PostgreSQL; `sslmod ### Bulk data load +
+ + + +
+ To perform a bulk data load migration from your source database to CockroachDB, run the `molt fetch` command with the required flags. -Specify the source and target database connections. For connection string formats, refer to [Source and target databases](#source-and-target-databases): +Specify the source and target database connections. For connection string formats, refer to [Source and target databases](#source-and-target-databases). +
{% include_cached copy-clipboard.html %} ~~~ --source $SOURCE --target $TARGET ~~~ +
+ +
+For Oracle Multitenant (CDB/PDB) sources, also include `--source-cdb` to specify the container database (CDB) connection string. + +{% include_cached copy-clipboard.html %} +~~~ +--source $SOURCE +--source-cdb $SOURCE_CDB +--target $TARGET +~~~ +
Specify how to move data to CockroachDB. Use [cloud storage](#bucket-path) for intermediate file storage: @@ -855,13 +958,34 @@ Alternatively, use [direct copy](#direct-copy) to move data directly without int --direct-copy ~~~ -Optionally, filter which schemas and tables to migrate. By default, all schemas and tables are migrated. For details, refer to [Schema and table selection](#schema-and-table-selection): +Optionally, filter the source data to migrate. By default, all schemas and tables are migrated. For details, refer to [Schema and table selection](#schema-and-table-selection). + +
+{% include_cached copy-clipboard.html %} +~~~ +--schema-filter 'migration_schema' +--table-filter '.*user.*' +~~~ +
+ +
+For Oracle sources, `--schema-filter` is case-insensitive. You can use either lowercase or uppercase: {% include_cached copy-clipboard.html %} ~~~ ---schema-filter 'public' +--schema-filter 'migration_schema' --table-filter '.*user.*' ~~~ +
+ +
+For MySQL sources, omit `--schema-filter` because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. If needed, use `--table-filter` to select specific tables: + +{% include_cached copy-clipboard.html %} +~~~ +--table-filter '.*user.*' +~~~ +
Specify how to handle target tables. By default, `--table-handling` is set to `none`, which loads data without changing existing data in the tables. For details, refer to [Target table handling](#target-table-handling): @@ -895,22 +1019,56 @@ For detailed steps, refer to [Bulk load migration]({% link molt/migrate-bulk-loa ### Load before replication +
+ + + +
+ To perform an initial data load before setting up ongoing replication with [MOLT Replicator]({% link molt/molt-replicator.md %}), run the `molt fetch` command without `--ignore-replication-check`. This captures replication checkpoints during the data load. The workflow is the same as [Bulk data load](#bulk-data-load), except: - Exclude `--ignore-replication-check`. MOLT Fetch will query and record replication checkpoints. +
+- You must include `--pglogical-replication-slot-name` and `--pglogical-publication-and-slot-drop-and-recreate` to automatically create the publication and replication slot during the data load. +
- After the data load completes, check the [CDC cursor](#cdc-cursor) in the output for the checkpoint value to use with MOLT Replicator. At minimum, the `molt fetch` command should include the source, target, and data path flags: +
+{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source $SOURCE \ +--target $TARGET \ +--bucket-path 's3://bucket/path' \ +--pglogical-replication-slot-name molt_slot \ +--pglogical-publication-and-slot-drop-and-recreate +~~~ +
+ +
+{% include_cached copy-clipboard.html %} +~~~ shell +molt fetch \ +--source $SOURCE \ +--target $TARGET \ +--bucket-path 's3://bucket/path' +~~~ +
+ +
{% include_cached copy-clipboard.html %} ~~~ shell molt fetch \ --source $SOURCE \ +--source-cdb $SOURCE_CDB \ --target $TARGET \ --bucket-path 's3://bucket/path' ~~~ +
The output will include a `cdc_cursor` value at the end of the fetch task: @@ -918,7 +1076,9 @@ The output will include a `cdc_cursor` value at the end of the fetch task: {"level":"info","type":"summary","fetch_id":"735a4fe0-c478-4de7-a342-cfa9738783dc","num_tables":1,"tables":["public.employees"],"cdc_cursor":"b7f9e0fa-2753-1e1f-5d9b-2402ac810003:3-21","net_duration_ms":4879.890041,"net_duration":"000h 00m 04s","time":"2024-03-18T12:37:02-04:00","message":"fetch complete"} ~~~ +
Use this `cdc_cursor` value when starting MOLT Replicator to ensure replication begins from the correct position. For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}). +
## Monitoring diff --git a/src/current/molt/molt-replicator.md b/src/current/molt/molt-replicator.md index 5a0bfacbaee..aa6d1dbe039 100644 --- a/src/current/molt/molt-replicator.md +++ b/src/current/molt/molt-replicator.md @@ -12,7 +12,7 @@ MOLT Replicator consumes change data from PostgreSQL [logical replication](https ## Terminology - *Checkpoint*: The position in the source database's transaction log from which replication begins or resumes: LSN (PostgreSQL), GTID (MySQL), or SCN (Oracle). -- *Staging database*: A CockroachDB database used by Replicator to store replication metadata, checkpoints, and buffered mutations. Specified with `--stagingSchema` and automatically created with `--stagingCreateSchema`. For details, refer to [Staging database](#staging-database). +- *Staging database*: A CockroachDB database used by Replicator to store replication metadata, checkpoints, and buffered mutations. Specified with [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) and automatically created with [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema). For details, refer to [Staging database](#staging-database). - *Forward replication*: Replicate changes from a source database (PostgreSQL, MySQL, or Oracle) to CockroachDB during a migration. For usage details, refer to [Forward replication with initial load](#forward-replication-with-initial-load). - *Failback*: Replicate changes from CockroachDB back to the source database. Used for migration rollback or to maintain data consistency on the source during migration. For usage details, refer to [Failback to source database](#failback-to-source-database). @@ -35,20 +35,20 @@ The source database must be configured for replication: |-------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------| | PostgreSQL source |
  • Enable logical replication by setting `wal_level = logical`.
| [Configure PostgreSQL for replication]({% link molt/migrate-load-replicate.md %}#configure-source-database-for-replication) | | MySQL source |
  • Enable [global transaction identifiers (GTID)](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html) and configure binary logging. Set `binlog-row-metadata` or `binlog-row-image` to `full`.
  • Configure sufficient binlog retention for migration duration.
| [Configure MySQL for replication]({% link molt/migrate-load-replicate.md %}?filters=mysql#configure-source-database-for-replication) | -| Oracle source |
  • Install [Oracle Instant Client]({% link molt/migrate-load-replicate.md %}?filters=oracle#oracle-instant-client).
  • [Enable `ARCHIVELOG` mode]({% link molt/migrate-load-replicate.md %}?filters=oracle#enable-archivelog-and-force-logging), supplemental logging for primary keys, and `FORCE LOGGING`.
  • [Create sentinel table]({% link molt/migrate-load-replicate.md %}#create-source-sentinel-table) (`_replicator_sentinel`) in source schema.
  • Grant and verify [LogMiner privileges]({% link molt/migrate-load-replicate.md %}#grant-logminer-privileges).
| [Configure Oracle for replication]({% link molt/migrate-load-replicate.md %}?filters=oracle#configure-source-database-for-replication) | +| Oracle source |
  • Install [Oracle Instant Client]({% link molt/migrate-load-replicate.md %}?filters=oracle#oracle-instant-client).
  • [Enable `ARCHIVELOG` mode]({% link molt/migrate-load-replicate.md %}?filters=oracle#enable-archivelog-and-force-logging), supplemental logging for primary keys, and `FORCE LOGGING`.
  • [Create sentinel table]({% link molt/migrate-load-replicate.md %}#create-source-sentinel-table) (`REPLICATOR_SENTINEL`) in source schema.
  • Grant and verify [LogMiner privileges]({% link molt/migrate-load-replicate.md %}#grant-logminer-privileges).
| [Configure Oracle for replication]({% link molt/migrate-load-replicate.md %}?filters=oracle#configure-source-database-for-replication) | | CockroachDB source (failback) |
  • [Enable rangefeeds]({% link {{ site.current_cloud_version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) (`kv.rangefeed.enabled = true`) (CockroachDB {{ site.data.products.core }} clusters only).
| [Configure CockroachDB for replication]({% link molt/migrate-failback.md %}#prepare-the-cockroachdb-cluster) | ### User permissions The SQL user running MOLT Replicator requires specific privileges on both the source and target databases: -| Database | Required Privileges | Details | -|------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| PostgreSQL source |
  • `SUPERUSER` role (recommended), or the following granular permissions:
  • `CREATE` and `SELECT` on database and tables to replicate.
  • Table ownership for adding tables to publications.
  • `LOGIN` and `REPLICATION` privileges to create replication slots and access replication data.
| [Create PostgreSQL migration user]({% link molt/migrate-load-replicate.md %}#create-migration-user-on-source-database) | -| MySQL source |
  • `SELECT` on tables to replicate.
  • `REPLICATION SLAVE` and `REPLICATION CLIENT` privileges for binlog access.
  • For `--fetchMetadata`, either `SELECT` on the source database or `PROCESS` globally.
| [Create MySQL migration user]({% link molt/migrate-load-replicate.md %}?filters=mysql#create-migration-user-on-source-database) | -| Oracle source |
  • `SELECT`, `INSERT`, `UPDATE` on `_replicator_sentinel` table.
  • `SELECT` on `V$` views (`V$LOG`, `V$LOGFILE`, `V$LOGMNR_CONTENTS`, `V$ARCHIVED_LOG`, `V$LOG_HISTORY`).
  • `SELECT` on `SYS.V$LOGMNR_*` views (`SYS.V$LOGMNR_DICTIONARY`, `SYS.V$LOGMNR_LOGS`, `SYS.V$LOGMNR_PARAMETERS`, `SYS.V$LOGMNR_SESSION`).
  • `LOGMINING` privilege.
  • `EXECUTE` on `DBMS_LOGMNR`.
  • For Oracle Multitenant, the user must be a common user (prefixed with `C##`) with privileges granted on both CDB and PDB.
| [Create Oracle migration user]({% link molt/migrate-load-replicate.md %}?filters=oracle#create-migration-user-on-source-database)

[Create sentinel table]({% link molt/migrate-load-replicate.md %}#create-source-sentinel-table)

[Grant LogMiner privileges]({% link molt/migrate-load-replicate.md %}#grant-logminer-privileges) | -| CockroachDB target (forward replication) |
  • `ALL` on target database.
  • `CREATE` on schema.
  • `SELECT`, `INSERT`, `UPDATE`, `DELETE` on target tables.
  • `CREATEDB` privilege for creating staging schema.
| [Create CockroachDB user]({% link molt/migrate-load-replicate.md %}#create-the-sql-user) | -| PostgreSQL, MySQL, or Oracle target (failback) |
  • `SELECT`, `INSERT`, `UPDATE` on tables to fail back to.
  • For Oracle, `FLASHBACK` is also required.
| [Grant PostgreSQL user permissions]({% link molt/migrate-failback.md %}#grant-target-database-user-permissions)

[Grant MySQL user permissions]({% link molt/migrate-failback.md %}?filter=mysql#grant-target-database-user-permissions)

[Grant Oracle user permissions]({% link molt/migrate-failback.md %}?filter=oracle#grant-target-database-user-permissions) | +| Database | Required Privileges | Details | +|------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PostgreSQL source |
  • `SUPERUSER` role (recommended), or the following granular permissions:
  • `CREATE` and `SELECT` on database and tables to replicate.
  • Table ownership for adding tables to publications.
  • `LOGIN` and `REPLICATION` privileges to create replication slots and access replication data.
| [Create PostgreSQL migration user]({% link molt/migrate-load-replicate.md %}#create-migration-user-on-source-database) | +| MySQL source |
  • `SELECT` on tables to replicate.
  • `REPLICATION SLAVE` and `REPLICATION CLIENT` privileges for binlog access.
  • For [`--fetchMetadata`]({% link molt/replicator-flags.md %}#fetch-metadata), either `SELECT` on the source database or `PROCESS` globally.
| [Create MySQL migration user]({% link molt/migrate-load-replicate.md %}?filters=mysql#create-migration-user-on-source-database) | +| Oracle source |
  • `SELECT`, `INSERT`, `UPDATE` on `REPLICATOR_SENTINEL` table.
  • `SELECT` on `V$` views (`V$LOG`, `V$LOGFILE`, `V$LOGMNR_CONTENTS`, `V$ARCHIVED_LOG`, `V$LOG_HISTORY`).
  • `SELECT` on `SYS.V$LOGMNR_*` views (`SYS.V$LOGMNR_DICTIONARY`, `SYS.V$LOGMNR_LOGS`, `SYS.V$LOGMNR_PARAMETERS`, `SYS.V$LOGMNR_SESSION`).
  • `LOGMINING` privilege.
  • `EXECUTE` on `DBMS_LOGMNR`.
  • For Oracle Multitenant, the user must be a common user (prefixed with `C##`) with privileges granted on both CDB and PDB.
| [Create Oracle migration user]({% link molt/migrate-load-replicate.md %}?filters=oracle#create-migration-user-on-source-database)

[Create sentinel table]({% link molt/migrate-load-replicate.md %}?filters=oracle#create-source-sentinel-table)

[Grant LogMiner privileges]({% link molt/migrate-load-replicate.md %}?filters=oracle#grant-logminer-privileges) | +| CockroachDB target (forward replication) |
  • `ALL` on target database.
  • `CREATE` on schema.
  • `SELECT`, `INSERT`, `UPDATE`, `DELETE` on target tables.
  • `CREATEDB` privilege for creating staging schema.
| [Create CockroachDB user]({% link molt/migrate-load-replicate.md %}#create-the-sql-user) | +| PostgreSQL, MySQL, or Oracle target (failback) |
  • `SELECT`, `INSERT`, `UPDATE` on tables to fail back to.
  • For Oracle, `FLASHBACK` is also required.
| [Grant PostgreSQL user permissions]({% link molt/migrate-failback.md %}#grant-target-database-user-permissions)

[Grant MySQL user permissions]({% link molt/migrate-failback.md %}?filter=mysql#grant-target-database-user-permissions)

[Grant Oracle user permissions]({% link molt/migrate-failback.md %}?filter=oracle#grant-target-database-user-permissions) | ## Installation @@ -74,9 +74,9 @@ MOLT Replicator supports forward replication from PostgreSQL, MySQL, and Oracle, MOLT Replicator supports three consistency modes for balancing throughput and transactional guarantees: -1. *Consistent* (failback mode only, default for CockroachDB sources): Preserves per-row order and source transaction atomicity. Concurrent transactions are controlled by `--parallelism`. +1. *Consistent* (failback mode only, default for CockroachDB sources): Preserves per-row order and source transaction atomicity. Concurrent transactions are controlled by [`--parallelism`]({% link molt/replicator-flags.md %}#parallelism). -1. *BestEffort* (failback mode only): Relaxes atomicity across tables that do not have foreign key constraints between them (maintains coherence within FK-connected groups). Enable with `--bestEffortOnly` or allow auto-entry via `--bestEffortWindow` set to a positive duration (such as `1s`). +1. *BestEffort* (failback mode only): Relaxes atomicity across tables that do not have foreign key constraints between them (maintains coherence within FK-connected groups). Enable with [`--bestEffortOnly`]({% link molt/replicator-flags.md %}#best-effort-only) or allow auto-entry via [`--bestEffortWindow`]({% link molt/replicator-flags.md %}#best-effort-window) set to a positive duration (such as `1s`). {{site.data.alerts.callout_info}} For independent tables (with no foreign key constraints), BestEffort mode applies changes immediately as they arrive, without waiting for the resolved timestamp. This provides higher throughput for tables that have no relationships with other tables. @@ -101,7 +101,7 @@ For command-specific flags and examples, refer to [Usage](#usage) and [Common wo ## Flags -{% include molt/replicator-flags.md %} +Refer to [Replicator Flags]({% link molt/replicator-flags.md %}). ## Usage @@ -143,7 +143,7 @@ replicator start Follow the security recommendations in [Connection security and credentials](#connection-security-and-credentials). {{site.data.alerts.end}} -`--sourceConn` specifies the connection string of the source database for forward replication. +[`--sourceConn`]({% link molt/replicator-flags.md %}#source-conn) specifies the connection string of the source database for forward replication. {{site.data.alerts.callout_info}} The source connection string **must** point to the primary instance of the source database. Replicas cannot provide the necessary replication checkpoints and transaction metadata required for ongoing replication. @@ -170,7 +170,7 @@ Oracle connection string: --sourceConn 'oracle://{username}:{password}@{host}:{port}/{service_name}' ~~~ -For Oracle Multitenant databases, `--sourcePDBConn` specifies the pluggable database (PDB) connection. `--sourceConn` specifies the container database (CDB): +For Oracle Multitenant databases, [`--sourcePDBConn`]({% link molt/replicator-flags.md %}#source-pdb-conn) specifies the pluggable database (PDB) connection. [`--sourceConn`]({% link molt/replicator-flags.md %}#source-conn) specifies the container database (CDB): {% include_cached copy-clipboard.html %} ~~~ @@ -178,7 +178,7 @@ For Oracle Multitenant databases, `--sourcePDBConn` specifies the pluggable data --sourcePDBConn 'oracle://{username}:{password}@{host}:{port}/{pdb_service_name}' ~~~ -For failback, `--stagingConn` specifies the CockroachDB connection string: +For failback, [`--stagingConn`]({% link molt/replicator-flags.md %}#staging-conn) specifies the CockroachDB connection string: {% include_cached copy-clipboard.html %} ~~~ @@ -187,7 +187,7 @@ For failback, `--stagingConn` specifies the CockroachDB connection string: ### Target connection strings -`--targetConn` specifies the connection string of the target CockroachDB database for forward replication: +[`--targetConn`]({% link molt/replicator-flags.md %}#target-conn) specifies the connection string of the target CockroachDB database for forward replication: {% include_cached copy-clipboard.html %} ~~~ @@ -195,28 +195,28 @@ For failback, `--stagingConn` specifies the CockroachDB connection string: ~~~ {{site.data.alerts.callout_info}} -For failback, `--targetConn` specifies the original source database (PostgreSQL, MySQL, or Oracle). For details, refer to [Failback to source database](#failback-to-source-database). +For failback, [`--targetConn`]({% link molt/replicator-flags.md %}#target-conn) specifies the original source database (PostgreSQL, MySQL, or Oracle). For details, refer to [Failback to source database](#failback-to-source-database). {{site.data.alerts.end}} ### Replication checkpoints MOLT Replicator requires a checkpoint value to start replication from the correct position in the source database's transaction log. -For PostgreSQL, use `--slotName` to specify the [replication slot created during the data load]({% link molt/migrate-load-replicate.md %}#start-fetch). The slot automatically tracks the LSN (Log Sequence Number): +For PostgreSQL, use [`--slotName`]({% link molt/replicator-flags.md %}#slot-name) to specify the [replication slot created during the data load]({% link molt/migrate-load-replicate.md %}#start-fetch). The slot automatically tracks the LSN (Log Sequence Number): {% include_cached copy-clipboard.html %} ~~~ --slotName molt_slot ~~~ -For MySQL, use `--defaultGTIDSet` with the GTID set from the [MOLT Fetch output]({% link molt/migrate-load-replicate.md %}?filters=mysql#start-fetch): +For MySQL, set [`--defaultGTIDSet`]({% link molt/replicator-flags.md %}#default-gtid-set) to the [`cdc_cursor` value]({% link molt/molt-fetch.md %}#cdc-cursor) from the MOLT Fetch output: {% include_cached copy-clipboard.html %} ~~~ --defaultGTIDSet '4c658ae6-e8ad-11ef-8449-0242ac140006:1-29' ~~~ -For Oracle, use `--scn` and `--backfillFromSCN` with the SCN values from the [MOLT Fetch output]({% link molt/migrate-load-replicate.md %}?filters=oracle#start-fetch): +For Oracle, set [`--scn`]({% link molt/replicator-flags.md %}#scn) and [`--backfillFromSCN`]({% link molt/replicator-flags.md %}#backfill-from-scn) to the [`cdc_cursor` values]({% link molt/molt-fetch.md %}#cdc-cursor) from the MOLT Fetch output: {% include_cached copy-clipboard.html %} ~~~ @@ -226,11 +226,11 @@ For Oracle, use `--scn` and `--backfillFromSCN` with the SCN values from the [MO ### Staging database -The staging database stores replication metadata, checkpoints, and buffered mutations. Specify the staging database with `--stagingSchema` and create it automatically with `--stagingCreateSchema`: +The staging database stores replication metadata, checkpoints, and buffered mutations. Specify the staging database with [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) in fully-qualified `database.schema` format and create it automatically with [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema): {% include_cached copy-clipboard.html %} ~~~ ---stagingSchema _replicator +--stagingSchema defaultdb._replicator --stagingCreateSchema ~~~ @@ -255,7 +255,7 @@ For failback scenarios, secure the connection from CockroachDB to MOLT Replicato #### TLS from CockroachDB to Replicator -Configure MOLT Replicator with server certificates using the `--tlsCertificate` and `--tlsPrivateKey` flags to specify the certificate and private key file paths. For example: +Configure MOLT Replicator with server certificates using the [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) and [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key) flags to specify the certificate and private key file paths. For example: {% include_cached copy-clipboard.html %} ~~~ shell @@ -269,8 +269,8 @@ These server certificates must correspond to the client certificates specified i Encode client certificates for changefeed webhook URLs: -- **Webhook URLs**: Use both URL encoding and base64 encoding: `base64 -i ./client.crt | jq -R -r '@uri'` -- **Non-webhook contexts**: Use base64 encoding only: `base64 -w 0 ca.cert` +- Webhook URLs: Use both URL encoding and base64 encoding: `base64 -i ./client.crt | jq -R -r '@uri'` +- Non-webhook contexts: Use base64 encoding only: `base64 -w 0 ca.cert` #### JWT authentication @@ -320,9 +320,11 @@ replicator make-jwt -k ec.key -a ycsb.public -o out.jwt ##### External JWT providers -The `make-jwt` command also supports a `--claim` [flag](#make-jwt-flags), which prints a JWT claim that can be signed by your existing JWT provider. The PEM-formatted public key or keys for that provider must be inserted into the `_replicator.jwt_public_keys` table. The `iss` (issuers) and `jti` (token id) fields will likely be specific to your auth provider, but the custom claim must be retained in its entirety. +The `make-jwt` command also supports a [`--claim`]({% link molt/replicator-flags.md %}#claim) flag, which prints a JWT claim that can be signed by your existing JWT provider. The PEM-formatted public key or keys for that provider must be inserted into the `_replicator.jwt_public_keys` table. The `iss` (issuers) and `jti` (token id) fields will likely be specific to your auth provider, but the custom claim must be retained in its entirety. -You can repeat the `-a` [flag](#make-jwt-flags) to create a claim for multiple schemas: +{{site.data.alerts.callout_success}} +You can repeat the [`-a`]({% link molt/replicator-flags.md %}#allow) flag to create a claim for multiple schemas. +{{site.data.alerts.end}} {% include_cached copy-clipboard.html %} ~~~ shell @@ -344,13 +346,9 @@ replicator make-jwt -a 'database.schema' --claim } ~~~ -{{site.data.alerts.callout_info}} -For details on the `make-jwt` command flags, refer to [`make-jwt` flags](#make-jwt-flags). -{{site.data.alerts.end}} - ### Production considerations -- Avoid `--disableAuthentication` and `--tlsSelfSigned` flags in production environments. These flags should only be used for testing or development purposes. +- Avoid [`--disableAuthentication`]({% link molt/replicator-flags.md %}#disable-authentication) and [`--tlsSelfSigned`]({% link molt/replicator-flags.md %}#tls-self-signed) flags in production environments. These flags should only be used for testing or development purposes. ### Supply chain security @@ -422,18 +420,25 @@ For Oracle Multitenant databases, also specify the PDB connection: --sourcePDBConn $SOURCE_PDB ~~~ -Specify the source Oracle schema to replicate from: +Specify the Oracle user that owns the tables to replicate. Oracle capitalizes identifiers by default, so use uppercase: {% include_cached copy-clipboard.html %} ~~~ ---sourceSchema migration_schema +--sourceSchema MIGRATION_USER ~~~
+Specify the target schema on CockroachDB with [`--targetSchema`]({% link molt/replicator-flags.md %}#target-schema) in fully-qualified `database.schema` format: + +{% include_cached copy-clipboard.html %} +~~~ +--targetSchema defaultdb.migration_schema +~~~ + To replicate from the correct position, specify the appropriate checkpoint value.
-Use `--slotName` to specify the slot created during the data load, which automatically tracks the LSN (Log Sequence Number) checkpoint: +Use [`--slotName`]({% link molt/replicator-flags.md %}#slot-name) to specify the slot [created during the data load]({% link molt/molt-fetch.md %}#load-before-replication), which automatically tracks the LSN (Log Sequence Number) checkpoint: {% include_cached copy-clipboard.html %} ~~~ @@ -442,7 +447,7 @@ Use `--slotName` to specify the slot created during the data load, which automat
-Use `--defaultGTIDSet` from the `cdc_cursor` field in the MOLT Fetch output: +Use [`--defaultGTIDSet`]({% link molt/replicator-flags.md %}#default-gtid-set) from the `cdc_cursor` field in the MOLT Fetch output: {% include_cached copy-clipboard.html %} ~~~ @@ -451,7 +456,7 @@ Use `--defaultGTIDSet` from the `cdc_cursor` field in the MOLT Fetch output:
-Use the `--scn` and `--backfillFromSCN` values from the MOLT Fetch output: +Use the [`--scn`]({% link molt/replicator-flags.md %}#scn) and [`--backfillFromSCN`]({% link molt/replicator-flags.md %}#backfill-from-scn) values from the `cdc_cursor` field in the MOLT Fetch output: {% include_cached copy-clipboard.html %} ~~~ @@ -460,11 +465,11 @@ Use the `--scn` and `--backfillFromSCN` values from the MOLT Fetch output: ~~~
-Use `--stagingSchema` to specify the staging database. Use `--stagingCreateSchema` to create it automatically on first run: +Use [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) to specify the staging database in fully-qualified `database.schema` format. Use [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema) to create it automatically on first run: {% include_cached copy-clipboard.html %} ~~~ ---stagingSchema _replicator +--stagingSchema defaultdb._replicator --stagingCreateSchema ~~~ @@ -476,8 +481,9 @@ At minimum, the `replicator` command should include the following flags: replicator pglogical \ --sourceConn $SOURCE \ --targetConn $TARGET \ +--targetSchema defaultdb.migration_schema \ --slotName molt_slot \ ---stagingSchema _replicator \ +--stagingSchema defaultdb._replicator \ --stagingCreateSchema ~~~ @@ -490,8 +496,9 @@ For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-repl replicator mylogical \ --sourceConn $SOURCE \ --targetConn $TARGET \ +--targetSchema defaultdb.public \ --defaultGTIDSet '4c658ae6-e8ad-11ef-8449-0242ac140006:1-29' \ ---stagingSchema _replicator \ +--stagingSchema defaultdb._replicator \ --stagingCreateSchema ~~~ @@ -504,11 +511,12 @@ For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-repl replicator oraclelogminer \ --sourceConn $SOURCE \ --sourcePDBConn $SOURCE_PDB \ ---sourceSchema migration_schema \ --targetConn $TARGET \ +--sourceSchema MIGRATION_USER \ +--targetSchema defaultdb.migration_schema \ --scn 26685786 \ --backfillFromSCN 26685444 \ ---stagingSchema _replicator \ +--stagingSchema defaultdb._replicator \ --stagingCreateSchema ~~~ @@ -525,7 +533,7 @@ For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-repl When resuming replication after an interruption, MOLT Replicator automatically uses the stored checkpoint to resume from the correct position. -Rerun the same `replicator` command used during [forward replication](#forward-replication-with-initial-load), specifying the same `--stagingSchema` value as before. Omit `--stagingCreateSchema` and any checkpoint flags. For example: +Rerun the same `replicator` command used during [forward replication](#forward-replication-with-initial-load), specifying the same fully-qualified [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) value as before. Omit [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema) and any checkpoint flags. For example:
{% include_cached copy-clipboard.html %} @@ -534,7 +542,7 @@ replicator pglogical \ --sourceConn $SOURCE \ --targetConn $TARGET \ --slotName molt_slot \ ---stagingSchema _replicator +--stagingSchema defaultdb._replicator ~~~ For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-replication.md %}). @@ -546,7 +554,7 @@ For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-re replicator mylogical \ --sourceConn $SOURCE \ --targetConn $TARGET \ ---stagingSchema _replicator +--stagingSchema defaultdb._replicator ~~~ For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-replication.md %}?filters=mysql). @@ -558,9 +566,9 @@ For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-re replicator oraclelogminer \ --sourceConn $SOURCE \ --sourcePDBConn $SOURCE_PDB \ ---sourceSchema migration_schema \ +--sourceSchema MIGRATION_USER \ --targetConn $TARGET \ ---stagingSchema _replicator +--stagingSchema defaultdb._replicator ~~~ For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-replication.md %}?filters=oracle). @@ -577,35 +585,35 @@ Use the `start` command for failback: replicator start ~~~ -Specify the target database connection (the database you originally migrated from). For connection string formats, refer to [Target connection strings](#target-connection-strings): +Specify the target database connection (the database you originally migrated from) with [`--targetConn`]({% link molt/replicator-flags.md %}#target-conn). For connection string formats, refer to [Target connection strings](#target-connection-strings): {% include_cached copy-clipboard.html %} ~~~ --targetConn $TARGET ~~~ -Specify the CockroachDB connection string. For details, refer to [Connect using a URL]({% link {{ site.current_cloud_version }}/connection-parameters.md %}#connect-using-a-url). +Specify the CockroachDB connection string with [`--stagingConn`]({% link molt/replicator-flags.md %}#staging-conn). For details, refer to [Connect using a URL]({% link {{ site.current_cloud_version }}/connection-parameters.md %}#connect-using-a-url). {% include_cached copy-clipboard.html %} ~~~ --stagingConn $STAGING ~~~ -Specify the staging database name. This should be the same staging database created during [Forward replication with initial load](#forward-replication-with-initial-load): +Specify the staging database name with [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) in fully-qualified `database.schema` format. This should be the same staging database created during [Forward replication with initial load](#forward-replication-with-initial-load): {% include_cached copy-clipboard.html %} ~~~ ---stagingSchema _replicator +--stagingSchema defaultdb._replicator ~~~ -Specify a webhook endpoint address for the changefeed to send changes to. For example: +Specify a webhook endpoint address for the changefeed to send changes to with [`--bindAddr`]({% link molt/replicator-flags.md %}#bind-addr). For example: {% include_cached copy-clipboard.html %} ~~~ --bindAddr :30004 ~~~ -Specify TLS certificate and private key file paths for secure webhook connections: +Specify TLS certificate and private key file paths for secure webhook connections with [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) and [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key): {% include_cached copy-clipboard.html %} ~~~ @@ -620,28 +628,31 @@ At minimum, the `replicator` command should include the following flags: replicator start \ --targetConn $TARGET \ --stagingConn $STAGING \ ---stagingSchema _replicator \ +--stagingSchema defaultdb._replicator \ --bindAddr :30004 \ --tlsCertificate ./certs/server.crt \ --tlsPrivateKey ./certs/server.key ~~~ -For detailed steps, refer to [Migration failback]({% link molt/migrate-failback.md %}). +After starting `replicator`, create a CockroachDB changefeed to send changes to MOLT Replicator. For detailed steps, refer to [Migration failback]({% link molt/migrate-failback.md %}). -## Monitoring +{{site.data.alerts.callout_info}} +When [creating the CockroachDB changefeed]({% link molt/migrate-failback.md %}#create-the-cockroachdb-changefeed), you specify the target database and schema in the webhook URL path. For PostgreSQL targets, use the fully-qualified format `/database/schema` (`/migration_db/migration_schema`). For MySQL targets, use the database name (`/migration_db`). For Oracle targets, use the uppercase schema name (`/MIGRATION_SCHEMA`). -### Metrics +Explicitly set a default `10s` [`webhook_client_timeout`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options) value in the `CREATE CHANGEFEED` statement. This value ensures that the webhook can report failures in inconsistent networking situations and make crash loops more visible. +{{site.data.alerts.end}} -MOLT Replicator can export [Prometheus](https://prometheus.io/) metrics by setting the `--metricsAddr` flag to a port (for example, `--metricsAddr :30005`). Metrics are not enabled by default. When enabled, metrics are available at the path `/_/varz`. For example: `http://localhost:30005/_/varz`. +## Monitoring -For a list of recommended metrics to monitor during replication, refer to: +### Metrics -- [Forward replication metrics]({% link molt/migrate-load-replicate.md %}#replicator-metrics) (PostgreSQL, MySQL, and Oracle sources) -- [Failback replication metrics]({% link molt/migrate-failback.md %}#replicator-metrics) (CockroachDB source) +MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: -You can use the [Replicator Grafana dashboard](https://replicator.cockroachdb.com/replicator_grafana_dashboard.json) to visualize the metrics. For Oracle-specific metrics, import the [Oracle Grafana dashboard](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json). +~~~ +--metricsAddr :30005 +~~~ -To check MOLT Replicator health when metrics are enabled, run `curl http://localhost:30005/_/healthz` (replacing the port with your `--metricsAddr` value). This returns a status code of `200` if Replicator is running. +For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}). ### Logging @@ -664,7 +675,7 @@ Redirect both streams to ensure all logs are captured for troubleshooting: ./replicator --logDestination replicator.log ... ~~~ -Enable debug logging with `-v`. For more granularity and system insights, enable trace logging with `-vv`. Pay close attention to warning- and error-level logs, as these indicate when Replicator is misbehaving. +Enable debug logging with [`-v`]({% link molt/replicator-flags.md %}#verbose). For more granularity and system insights, enable trace logging with [`-vv`]({% link molt/replicator-flags.md %}#verbose). Pay close attention to warning- and error-level logs, as these indicate when Replicator is misbehaving. ## Best practices diff --git a/src/current/molt/molt-verify.md b/src/current/molt/molt-verify.md index b0618beccd8..185cfb8773b 100644 --- a/src/current/molt/molt-verify.md +++ b/src/current/molt/molt-verify.md @@ -56,7 +56,7 @@ Complete the following items before using MOLT Verify: - Use the encoded password in your connection string. For example: ~~~ - postgres://postgres:a%2452%26@localhost:5432/replicationload + postgres://postgres:a%2452%26@localhost:5432/molt ~~~ ## Flags @@ -66,8 +66,6 @@ Flag | Description `--source` | (Required) Connection string for the source database. `--target` | (Required) Connection string for the target database. `--concurrency` | Number of threads to process at a time when reading the tables.
**Default:** 16
For faster verification, set this flag to a higher value. {% comment %}
Note: Table splitting by shard only works for [`INT`]({% link {{site.current_cloud_version}}/int.md %}), [`UUID`]({% link {{site.current_cloud_version}}/uuid.md %}), and [`FLOAT`]({% link {{site.current_cloud_version}}/float.md %}) data types.{% endcomment %} -`--continuous` | Verify tables in a continuous loop.
**Default:** `false` -`--live` | Retry verification on rows before emitting warnings or errors. This is useful during live data import, when temporary mismatches can occur.
**Default:** `false` `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `verify-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. `--metrics-listen-addr` | Address of the metrics endpoint, which has the path `{address}/metrics`.

**Default:** `'127.0.0.1:3030'` | `--row-batch-size` | Number of rows to get from a table at a time.
**Default:** 20000 @@ -117,7 +115,7 @@ When verification completes, the output displays a summary message like the foll ## Known limitations -- MOLT Verify compares 20,000 rows at a time by default, and row values can change between batches, potentially resulting in temporary inconsistencies in data. If `--live` mode is enabled, MOLT Verify retries verification on these rows. To configure the row batch size, use the `--row_batch_size` [flag](#flags). +- MOLT Verify compares 20,000 rows at a time by default, and row values can change between batches, potentially resulting in temporary inconsistencies in data. To configure the row batch size, use the `--row_batch_size` [flag](#flags). - MOLT Verify checks for collation mismatches on [primary key]({% link {{site.current_cloud_version}}/primary-key.md %}) columns. This may cause validation to fail when a [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) is used as a primary key and the source and target databases are using different [collations]({% link {{site.current_cloud_version}}/collate.md %}). - MOLT Verify might give an error in case of schema changes on either the source or target database. - [Geospatial types]({% link {{site.current_cloud_version}}/spatial-data-overview.md %}#spatial-objects) cannot yet be compared. diff --git a/src/current/molt/replicator-flags.md b/src/current/molt/replicator-flags.md new file mode 100644 index 00000000000..5532e0f8502 --- /dev/null +++ b/src/current/molt/replicator-flags.md @@ -0,0 +1,86 @@ +--- +title: Replicator Flags +summary: Flag reference for MOLT Replicator +toc: false +docs_area: migrate +--- + +This page lists all available flags for the [MOLT Replicator commands]({% link molt/molt-replicator.md %}#commands): `start`, `pglogical`, `mylogical`, `oraclelogminer`, and `make-jwt`. + +| Flag | Commands | Type | Description | +|---------------------------------------------------------------------------------------------|-----------------------------------------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `-a`, `--allow` | `make-jwt` | `STRING` | One or more `database.schema` identifiers. Can be repeated for multiple schemas. | +| `--applyTimeout` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | The maximum amount of time to wait for an update to be applied.

**Default:** `30s` | +| `--assumeIdempotent` | `start` | `BOOL` | Disable the extra staging table queries that debounce non-idempotent redelivery in changefeeds. | +| `--backfillFromSCN` | `oraclelogminer` | `INT` | The SCN of the earliest active transaction at the time of the initial snapshot. Ensures no transactions are skipped when starting replication from Oracle. | +| `--bestEffortOnly` | `start` | `BOOL` | Eventually-consistent mode; useful for high-throughput, skew-tolerant schemas with [foreign keys]({% link {{ site.current_cloud_version }}/foreign-key.md %}). | +| `--bestEffortWindow` | `start` | `DURATION` | Use an eventually-consistent mode for initial backfill or when replication is behind; `0` to disable.

**Default:** `1h0m0s` | +| `--bindAddr` | `start` | `STRING` | The network address to bind to.

**Default:** `":26258"` | +| `--claim` | `make-jwt` | `BOOL` | If `true`, print a minimal JWT claim instead of signing. | +| `--collapseMutations` | `start`, `pglogical`, `mylogical` | `BOOL` | Combine multiple mutations on the same primary key within each batch into a single mutation.

**Default:** `true` | +| `--defaultGTIDSet` | `mylogical` | `STRING` | **Required** the first time `replicator` is run. The default GTID set, in the format `source_uuid:min(interval_start)-max(interval_end)`, which provides a replication marker for streaming changes. | +| `--disableAuthentication` | `start` | `BOOL` | Disable authentication of incoming Replicator requests; not recommended for production. | +| `--discard` | `start` | `BOOL` | **Dangerous:** Discard all incoming HTTP requests; useful for changefeed throughput testing. Not intended for production. | +| `--discardDelay` | `start` | `DURATION` | Adds additional delay in discard mode; useful for gauging the impact of changefeed round-trip time (RTT). | +| `--dlqTableName` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `IDENT` | The name of a table in the target schema for storing dead-letter entries.

**Default:** `replicator_dlq` | +| `--enableCheckpointStream` | `start` | `BOOL` | Enable checkpoint streaming (use an internal changefeed from the staging table for real-time updates), rather than checkpoint polling (query the staging table for periodic updates), for failback replication.

**Default:** `false` (use checkpoint polling) | +| `--enableParallelApplies` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `BOOL` | Enable parallel application of independent table groups during replication. By default, applies are synchronous. When enabled, this increases throughput at the cost of higher target pool usage and memory usage.

**Default:** `false` | +| `--fetchMetadata` | `mylogical` | `BOOL` | Fetch column metadata explicitly, for older versions of MySQL that do not support `binlog_row_metadata`. | +| `--flushPeriod` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | Flush queued mutations after this duration.

**Default:** `1s` | +| `--flushSize` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | Ideal batch size to determine when to flush mutations.

**Default:** `1000` | +| `--gracePeriod` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | Allow background processes to exit.

**Default:** `30s` | +| `--healthCheckTimeout` | `start` | `DURATION` | The timeout for the health check endpoint.

**Default:** `5s` | +| `--httpResponseTimeout` | `start` | `DURATION` | The maximum amount of time to allow an HTTP handler to execute.

**Default:** `2m0s` | +| `--immediate` | `start` | `BOOL` | Bypass staging tables and write directly to target; recommended only for KV-style workloads with no [foreign keys]({% link {{ site.current_cloud_version }}/foreign-key.md %}). | +| `-k`, `--key` | `make-jwt` | `STRING` | The path to a PEM-encoded private key to sign the token with. | +| `--limitLookahead` | `start` | `INT` | Limit number of checkpoints to be considered when computing the resolving range; may cause replication to stall completely if older mutations cannot be applied. | +| `--logDestination` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | Write logs to a file. If not specified, write logs to `stdout`. | +| `--logFormat` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | Choose log output format: `"fluent"`, `"text"`.

**Default:** `"text"` | +| `--maxRetries` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | Maximum number of times to retry a failed mutation on the target (for example, due to contention or a temporary unique constraint violation) before treating it as a hard failure.

**Default:** `10` | +| `--metricsAddr` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | A `:port` or `host:port` on which to serve metrics and diagnostics. The metrics endpoint is `http://{host}:{port}/_/varz`. | +| `--ndjsonBufferSize` | `start` | `INT` | The maximum amount of data to buffer while reading a single line of `ndjson` input; increase when source cluster has large blob values.

**Default:** `65536` | +| `--oracle-application-users` | `oraclelogminer` | `STRING` | List of Oracle usernames responsible for DML transactions in the PDB schema. Enables replication from the latest-possible starting point. Usernames are case-sensitive and must match the internal Oracle usernames (e.g., `PDB_USER`). | +| `-o`, `--out` | `make-jwt` | `STRING` | A file to write the token to. | +| `--parallelism` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | The number of concurrent database transactions to use.

**Default:** `16` | +| `--publicationName` | `pglogical` | `STRING` | The publication within the source database to replicate. | +| `--quiescentPeriod` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | How often to retry deferred mutations.

**Default:** `10s` | +| `--replicationProcessID` | `mylogical` | `UINT32` | The replication process ID to report to the source database.

**Default:** `10` | +| `--retireOffset` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | How long to delay removal of applied mutations.

**Default:** `24h0m0s` | +| `--retryInitialBackoff` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | Initial delay before the first retry attempt when applying a mutation to the target database fails due to a retryable error, such as contention or a temporary unique constraint violation.

**Default:** `25ms` | +| `--retryMaxBackoff` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | Maximum delay between retry attempts when applying mutations to the target database fails due to retryable errors.

**Default:** `2s` | +| `--retryMultiplier` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | Multiplier that controls how quickly the backoff interval increases between successive retries of failed applies to the target database.

**Default:** `2` | +| `--scanSize` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | The number of rows to retrieve from the staging database used to store metadata for replication.

**Default:** `10000` | +| `--schemaRefresh` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | How often a watcher will refresh its schema. If this value is zero or negative, refresh behavior will be disabled.

**Default:** `1m0s` | +| `--scn` | `oraclelogminer` | `INT` | **Required** the first time `replicator` is run. The snapshot System Change Number (SCN) from the initial data load, which provides a replication marker for streaming changes. | +| `--slotName` | `pglogical` | `STRING` | **Required.** PostgreSQL replication slot name. Must match the slot name specified with `--pglogical-replication-slot-name` in the [MOLT Fetch command]({% link molt/molt-fetch.md %}#load-before-replication).

**Default:** `"replicator"` | +| `--sourceConn` | `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | The source database's connection string. When replicating from Oracle, this is the connection string of the Oracle container database (CDB). | +| `--sourcePDBConn` | `oraclelogminer` | `STRING` | Connection string for the Oracle pluggable database (PDB). Only required when using an [Oracle multitenant configuration](https://docs.oracle.com/en/database/oracle/oracle-database/21/cncpt/CDBs-and-PDBs.html). [`--sourceConn`](#source-conn) **must** be included. | +| `--sourceSchema` | `oraclelogminer` | `STRING` | **Required.** Source schema name on Oracle where tables will be replicated from. | +| `--stageDisableCreateTableReaderIndex` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `BOOL` | Disable the creation of partial covering indexes to improve read performance on staging tables. Set to `true` if creating indexes on existing tables would cause a significant operational impact.

**Default:** `false` | +| `--stageMarkAppliedLimit` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | Limit the number of mutations to be marked applied in a single statement.

**Default:** `100000` | +| `--stageSanityCheckPeriod` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | How often to validate staging table apply order (`-1` to disable).

**Default:** `10m0s` | +| `--stageSanityCheckWindow` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | How far back to look when validating staging table apply order.

**Default:** `1h0m0s` | +| `--stageUnappliedPeriod` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | How often to report the number of unapplied mutations in staging tables (`-1` to disable).

**Default:** `1m0s` | +| `--stagingConn` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | The staging database's connection string. | +| `--stagingCreateSchema` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `BOOL` | Automatically create the staging schema if it does not exist. | +| `--stagingIdleTime` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | Maximum lifetime of an idle connection.

**Default:** `1m0s` | +| `--stagingJitterTime` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | The time over which to jitter database pool disconnections.

**Default:** `15s` | +| `--stagingMaxLifetime` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | The maximum lifetime of a database connection.

**Default:** `5m0s` | +| `--stagingMaxPoolSize` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | The maximum number of staging database connections.

**Default:** `128` | +| `--stagingSchema` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | Name of the CockroachDB schema that stores replication metadata. **Required** each time `replicator` is rerun after being interrupted, as the schema contains a checkpoint table that enables replication to resume from the correct transaction.

**Default:** `_replicator.public` | +| `--standbyTimeout` | `pglogical` | `DURATION` | How often to report WAL progress to the source server.

**Default:** `5s` | +| `--targetApplyQueueSize` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | Size of the apply queue that buffers mutations before they are written to the target database. Larger values can improve throughput, but increase memory usage. This flag applies only to CockroachDB and PostgreSQL (`pglogical`) sources, and replaces the deprecated `--copierChannel` and `--stageCopierChannelSize` flags. | +| `--targetConn` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | The target database's connection string. | +| `--targetIdleTime` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | Maximum lifetime of an idle connection.

**Default:** `1m0s` | +| `--targetJitterTime` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | The time over which to jitter database pool disconnections.

**Default:** `15s` | +| `--targetMaxLifetime` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | The maximum lifetime of a database connection.

**Default:** `5m0s` | +| `--targetMaxPoolSize` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | The maximum number of target database connections.

**Default:** `128` | +| `--targetSchema` | `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | **Required.** The SQL database schema in the target cluster to update. CockroachDB schema names must be fully qualified in the format `database.schema`. | +| `--targetStatementCacheSize` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | The maximum number of prepared statements to retain.

**Default:** `128` | +| `--taskGracePeriod` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | How long to allow for task cleanup when recovering from errors.

**Default:** `1m0s` | +| `--timestampLimit` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `INT` | The maximum number of source timestamps to coalesce into a target transaction.

**Default:** `1000` | +| `--tlsCertificate` | `start` | `STRING` | A path to a PEM-encoded TLS certificate chain. | +| `--tlsPrivateKey` | `start` | `STRING` | A path to a PEM-encoded TLS private key. | +| `--tlsSelfSigned` | `start` | `BOOL` | If true, generate a self-signed TLS certificate valid for `localhost`. | +| `--userscript` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | The path to a TypeScript configuration script. For example, `--userscript 'script.ts'`. | +| `-v`, `--verbose` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `COUNT` | Increase logging verbosity. Use `-v` for `debug` logging or `-vv` for `trace` logging. | diff --git a/src/current/molt/replicator-metrics.md b/src/current/molt/replicator-metrics.md new file mode 100644 index 00000000000..0b95e4c2ff9 --- /dev/null +++ b/src/current/molt/replicator-metrics.md @@ -0,0 +1,303 @@ +--- +title: Replicator Metrics +summary: Learn how to monitor stages of the MOLT Replicator pipeline. +toc: true +docs_area: migrate +--- + +[MOLT Replicator]({% link molt/molt-replicator.md %}) exposes Prometheus metrics at each stage of the [replication pipeline](#replication-pipeline). When using Replicator to perform [forward replication]({% link molt/migrate-load-replicate.md %}#start-replicator) or [failback]({% link molt/migrate-failback.md %}), you should monitor the health of each relevant pipeline stage to quickly detect issues. + +This page describes and provides usage guidelines for Replicator metrics, according to the replication source: + +- PostgreSQL +- MySQL +- Oracle +- CockroachDB (during [failback]({% link molt/migrate-failback.md %})) + +
+ + + + +
+ +## Replication pipeline + +[MOLT Replicator]({% link molt/molt-replicator.md %}) replicates data as a pipeline of change events that travel from the source database to the target database where changes are applied. The Replicator pipeline consists of four stages: + +- [**Source read**](#source-read): Connects Replicator to the source database and captures changes via logical replication (PostgreSQL, MySQL), LogMiner (Oracle), or [changefeed messages]({% link {{ site.current_cloud_version }}/changefeed-messages.md %}) (CockroachDB). + +
+- **Staging**: Buffers mutations for ordered processing and crash recovery. +
+ +
+- [**Staging**](#staging): Buffers mutations for ordered processing and crash recovery. +
+ +
+- [**Core sequencer**](#core-sequencer): Processes staged mutations, maintains ordering guarantees, and coordinates transaction application. +
+ +
+- **Core sequencer**: Processes staged mutations, maintains ordering guarantees, and coordinates transaction application. +
+ +- [**Target apply**](#target-apply): Applies mutations to the target database. + +## Set up metrics + +Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following command exposes metrics on port `30005`: + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator start \ +--targetConn $TARGET \ +--stagingConn $STAGING \ +--metricsAddr :30005 +... +~~~ + +To collect Replicator metrics, set up [Prometheus](https://prometheus.io/) to scrape the [Replicator metrics endpoint](#metrics-endpoints). To [visualize Replicator metrics](#visualize-metrics), use [Grafana](https://grafana.com/) to create dashboards. + +## Metrics endpoints + +The following endpoints are available when you [enable Replicator metrics](#set-up-metrics): + +| Endpoint | Description | +|-----------------|----------------------------------------------------------------------------| +| `/_/varz` | Prometheus metrics endpoint. | +| `/_/diag` | Structured diagnostic information (JSON). | +| `/_/healthz` | Health check endpoint. | +| `/debug/pprof/` | Go pprof handlers for profiling. | + +For example, to view the current snapshot of Replicator metrics on port `30005`, open `http://localhost:30005/_/varz` in a browser. To track metrics over time and create visualizations, use Prometheus and Grafana as described in [Set up metrics](#set-up-metrics). + +To check Replicator health: + +{% include_cached copy-clipboard.html %} +~~~ shell +curl http://localhost:30005/_/healthz +~~~ + +~~~ +OK +~~~ + +### Visualize metrics + +
+Use the [Replicator Grafana dashboard](https://replicator.cockroachdb.com/replicator_grafana_dashboard.json) to visualize metrics. +
+ +
+Use the [Replicator Grafana dashboard](https://replicator.cockroachdb.com/replicator_grafana_dashboard.json) to visualize metrics. For Oracle sources, also import the [Oracle Grafana dashboard](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json) to visualize [Oracle source metrics](#oracle-source). +
+ +## Overall replication metrics + +### High-level performance metrics + +Monitor the following metrics to track the overall health of the [replication pipeline](#replication-pipeline): + +
+- `core_source_lag_seconds` + - Description: Age of the most recently received checkpoint. This represents the time from source commit to `COMMIT` event processing. + - Interpretation: If consistently increasing, Replicator is falling behind in reading source changes, and cannot keep pace with database changes. +
+
+- `core_source_lag_seconds` + - Description: Age of the most recently received checkpoint. This represents the time elapsed since the latest received resolved timestamp. + - Interpretation: If consistently increasing, Replicator is falling behind in reading source changes, and cannot keep pace with database changes. +
+
+- `target_apply_mutation_age_seconds` + - Description: End-to-end replication lag per mutation from source commit to target apply. Measures the difference between current wall time and the mutation's [MVCC timestamp]({% link {{ site.current_cloud_version }}/architecture/storage-layer.md %}#mvcc). + - Interpretation: Higher values mean that older mutations are being applied, and indicate end-to-end pipeline delays. Compare across tables to find bottlenecks. +
+- `target_apply_queue_utilization_percent` + - Description: Percentage of target apply queue capacity utilization. + - Interpretation: Values approaching 100 percent indicate severe backpressure throughout the pipeline, and potential data processing delays. + +
+### Replication lag + +Monitor the following metric to track end-to-end replication lag: + +- `target_apply_transaction_lag_seconds` + - Description: Age of the transaction applied to the target table, measuring time from source commit to target apply. + - Interpretation: Consistently high values indicate bottlenecks in the pipeline. Compare with `core_source_lag_seconds` to determine if the delay is in source read or target apply. +
+ +
+### Progress tracking + +Monitor the following metrics to track checkpoint progress: + +- `target_applied_timestamp_seconds` + - Description: Wall time (Unix timestamp) of the most recently applied resolved timestamp. + - Interpretation: Use to verify continuous progress. Stale values indicate apply stalls. +- `target_pending_timestamp_seconds` + - Description: Wall time (Unix timestamp) of the most recently received resolved timestamp. + - Interpretation: A gap between this metric and `target_applied_timestamp_seconds` indicates apply backlog, meaning that the pipeline cannot keep up with incoming changes. +
+ +## Replication pipeline metrics + +### Source read + +[Source read](#replication-pipeline) metrics track the health of connections to source databases and the volume of incoming changes. + +{{site.data.alerts.callout_info}} +For checkpoint terminology, refer to the [MOLT Replicator documentation]({% link molt/molt-replicator.md %}#terminology). +{{site.data.alerts.end}} + +
+#### CockroachDB source + +- `checkpoint_committed_age_seconds` + - Description: Age of the committed checkpoint. + - Interpretation: Increasing values indicate checkpoint commits are falling behind, which affects crash recovery capability. +- `checkpoint_proposed_age_seconds` + - Description: Age of the proposed checkpoint. + - Interpretation: A gap with `checkpoint_committed_age_seconds` indicates checkpoint commit lag. +- `checkpoint_commit_duration_seconds` + - Description: Amount of time taken to save the committed checkpoint to the staging database. + - Interpretation: High values indicate staging database bottlenecks due to write contention or performance issues. +- `checkpoint_proposed_going_backwards_errors_total` + - Description: Number of times an error condition occurred where the changefeed was restarted. + - Interpretation: Indicates source changefeed restart or time regression. Requires immediate investigation of source changefeed stability. +
+ +
+#### Oracle source + +{{site.data.alerts.callout_success}} +To visualize the following metrics, import the [Oracle Grafana dashboard](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json). +{{site.data.alerts.end}} + +- `oraclelogminer_scn_interval_size` + - Description: Size of the interval from the start SCN to the current Oracle SCN. + - Interpretation: Values larger than the [`--scnWindowSize`]({% link molt/replicator-flags.md %}#scn) flag value indicate replication lag, or that replication is idle. +- `oraclelogminer_time_per_window_seconds` + - Description: Amount of time taken to fully process an SCN interval. + - Interpretation: Large values indicate Oracle slowdown, blocked replication loop, or slow processing. +- `oraclelogminer_query_redo_logs_duration_seconds` + - Description: Amount of time taken to query redo logs from LogMiner. + - Interpretation: High values indicate Oracle is under load or the SCN interval is too large. +- `oraclelogminer_num_inflight_transactions_in_memory` + - Description: Current number of in-flight transactions in memory. + - Interpretation: High counts indicate long-running transactions on source. Monitor for memory usage. +- `oraclelogminer_num_async_checkpoints_in_queue` + - Description: Checkpoints queued for processing against staging database. + - Interpretation: Values close to the `--checkpointQueueBufferSize` flag value indicate checkpoint processing cannot keep up with incoming checkpoints. +- `oraclelogminer_upsert_checkpoints_duration` + - Description: Amount of time taken to upsert checkpoint batch into staging database. + - Interpretation: High values indicate the staging database is under heavy load or batch size is too large. +- `oraclelogminer_delete_checkpoints_duration` + - Description: Amount of time taken to delete old checkpoints from the staging database. + - Interpretation: High values indicate staging database load or long-running transactions preventing checkpoint deletion. +
+ +
+#### MySQL source + +- `mylogical_dial_success_total` + - Description: Number of times Replicator successfully started logical replication. + - Interpretation: Multiple successes may indicate reconnects. Monitor for connection stability. +- `mylogical_dial_failure_total` + - Description: Number of times Replicator failed to start logical replication. + - Interpretation: Nonzero values indicate connection issues. Check network connectivity and source database health. +- `mutations_total` + - Description: Total number of mutations processed, labeled by source and mutation type (insert/update/delete). + - Interpretation: Use to monitor replication throughput and identify traffic patterns. +
+ +
+#### PostgreSQL source + +- `pglogical_dial_success_total` + - Description: Number of times Replicator successfully started logical replication (executed `START_REPLICATION` command). + - Interpretation: Multiple successes may indicate reconnects. Monitor for connection stability. +- `pglogical_dial_failure_total` + - Description: Number of times Replicator failed to start logical replication (failure to execute `START_REPLICATION` command). + - Interpretation: Nonzero values indicate connection issues. Check network connectivity and source database health. +- `mutations_total` + - Description: Total number of mutations processed, labeled by source and mutation type (insert/update/delete). + - Interpretation: Use to monitor replication throughput and identify traffic patterns. +
+ +
+### Staging + +[Staging](#replication-pipeline) metrics track the health of the staging layer where mutations are buffered for ordered processing. + +{{site.data.alerts.callout_info}} +For checkpoint terminology, refer to the [MOLT Replicator documentation]({% link molt/molt-replicator.md %}#terminology). +{{site.data.alerts.end}} + +- `stage_commit_lag_seconds` + - Description: Time between writing a mutation to source and writing it to staging. + - Interpretation: High values indicate delays in getting data into the staging layer. +- `stage_mutations_total` + - Description: Number of mutations staged for each table. + - Interpretation: Use to monitor staging throughput per table. +- `stage_duration_seconds` + - Description: Amount of time taken to successfully stage mutations. + - Interpretation: High values indicate write performance issues on the staging database. +
+ +
+### Core sequencer + +[Core sequencer](#replication-pipeline) metrics track mutation processing, ordering, and transaction coordination. + +- `core_sweep_duration_seconds` + - Description: Duration of each schema sweep operation, which looks for and applies staged mutations. + - Interpretation: Long durations indicate that large backlogs, slow staging reads, or slow target writes are affecting throughput. +- `core_sweep_mutations_applied_total` + - Description: Total count of mutations read from staging and successfully applied to the target database during a sweep. + - Interpretation: Use to monitor processing throughput. A flat line indicates no mutations are being applied. +- `core_sweep_success_timestamp_seconds` + - Description: Wall time (Unix timestamp) at which a sweep attempt last succeeded. + - Interpretation: If this value stops updating and becomes stale, it indicates that the sweep has stopped. +- `core_parallelism_utilization_percent` + - Description: Percentage of the configured parallelism that is actively being used for concurrent transaction processing. + - Interpretation: High utilization indicates bottlenecks in mutation processing. +
+ +### Target apply + +[Target apply](#replication-pipeline) metrics track mutation application to the target database. + +- `target_apply_queue_size` + - Description: Number of transactions waiting in the target apply queue. + - Interpretation: High values indicate target apply cannot keep up with incoming transactions. +- `target_apply_queue_utilization_percent` + - Description: Percentage of apply queue capacity utilization. + - Interpretation: Values above 90 percent indicate severe backpressure. Increase [`--targetApplyQueueSize`]({% link molt/replicator-flags.md %}#target-apply-queue-size) or investigate target database performance. +- `apply_duration_seconds` + - Description: Amount of time taken to successfully apply mutations to a table. + - Interpretation: High values indicate target database performance issues or contention. +- `apply_upserts_total` + - Description: Number of rows upserted to the target. + - Interpretation: Use to monitor write throughput. Should grow steadily during active replication. +- `apply_deletes_total` + - Description: Number of rows deleted from the target. + - Interpretation: Use to monitor delete throughput. Compare with delete operations on the source database. +- `apply_errors_total` + - Description: Number of times an error was encountered while applying mutations. + - Interpretation: Growing error count indicates target database issues or constraint violations. +- `apply_conflicts_total` + - Description: Number of rows that experienced a compare-and-set (CAS) conflict. + - Interpretation: High counts indicate concurrent modifications or stale data conflicts. May require conflict resolution tuning. +- `apply_resolves_total` + - Description: Number of rows that experienced a compare-and-set (CAS) conflict and were successfully resolved. + - Interpretation: Compare with `apply_conflicts_total` to verify conflict resolution is working. Should be close to or equal to conflicts. + +## See also + +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [Replicator Flags]({% link molt/replicator-flags.md %}) +- [Load and Replicate]({% link molt/migrate-load-replicate.md %}) +- [Migration Failback]({% link molt/migrate-failback.md %}) diff --git a/src/current/releases/molt.md b/src/current/releases/molt.md index 734b47fa9c8..e7e6755f594 100644 --- a/src/current/releases/molt.md +++ b/src/current/releases/molt.md @@ -10,7 +10,7 @@ This page has details about each release of the following [MOLT (Migrate Off Leg - `molt`: [MOLT Fetch]({% link molt/molt-fetch.md %}) and [MOLT Verify]({% link molt/molt-verify.md %}) - `replicator`: [MOLT Replicator]({% link molt/molt-replicator.md %}) -Cockroach Labs recommends using the latest available version of each tool. See [Installation](#installation). +Cockroach Labs recommends using the latest available version of each tool. Refer to [Installation](#installation). ## Installation @@ -39,7 +39,7 @@ MOLT Fetch/Verify 1.3.2 is [available](#installation). - Fixed a bug in `escape-password` where passwords that start with a hyphen were not handled correctly. Users must now pass the `--password` flag when running `escape-password`. For example, `molt escape-password --password 'a$52&'`. - Added support for assume role authentication during [data export]({% link molt/molt-fetch.md %}#data-export-phase) with MOLT Fetch. - Added support to `replicator` for retrying unique constraint violations on the target database, which can be temporary in some cases. -- Added exponential backoff to `replicator` for retryable errors when applying mutations to the target database. This reduces load on the target database and prevents exhausting retries prematurely. The new [replication flags]({% link molt/molt-replicator.md %}#flags) `--retryInitialBackoff`, `--retryMaxBackoff`, and `--retryMultiplier` control backoff behavior. The new `--maxRetries` flag configures the maximum number of retries. To retain the previous "immediate retry" behavior, set `--retryMaxBackoff 1ns --retryInitialBackoff 1ns --retryMultiplier 1`. +- Added exponential backoff to `replicator` for retryable errors when applying mutations to the target database. This reduces load on the target database and prevents exhausting retries prematurely. The new [replication flags]({% link molt/replicator-flags.md %}) `--retryInitialBackoff`, `--retryMaxBackoff`, and `--retryMultiplier` control backoff behavior. The new `--maxRetries` flag configures the maximum number of retries. To retain the previous "immediate retry" behavior, set `--retryMaxBackoff 1ns --retryInitialBackoff 1ns --retryMultiplier 1`. - Added support to `replicator` for the `source_lag_seconds`, `target_lag_seconds`, `apply_mutation_age_seconds`, and `source_commit_to_apply_lag_seconds` metrics for replication from PostgreSQL and MySQL, and introduced histogram metrics `source_lag_seconds_histogram` and `target_lag_seconds_histogram` for replication from CockroachDB. `source_lag_seconds` measures the delay before data is ready to be processed by `replicator`, while `target_lag_seconds` measures the "end-to-end" delay until `replicator` has written data to the target. A steady increase in `source_lag_seconds` may indicate `replicator` cannot keep up with the source workload, while a steady increase in `target_lag_seconds` may indicate `replicator` cannot keep up with the source workload or that writes on the target database are bottlenecked. @@ -148,7 +148,7 @@ MOLT Fetch/Verify 1.2.1 is [available](#installation). - MySQL columns of type `BIGINT UNSIGNED` or `SERIAL` are now auto-mapped to [`DECIMAL`]({% link {{ site.current_cloud_version }}/decimal.md %}) type in CockroachDB. MySQL regular `BIGINT` types are mapped to [`INT`]({% link {{ site.current_cloud_version }}/int.md %}) type in CockroachDB. - The `pglogical` replication workflow was modified in order to enforce safer and simpler defaults for the [`data-load`]({% link molt/molt-fetch.md %}#fetch-mode), `data-load-and-replication`, and `replication-only` workflows for PostgreSQL sources. Fetch now ensures that the publication is created before the slot, and that `replication-only` defaults to using publications and slots created either in previous Fetch runs or manually. - Fixed scan iterator query ordering for `BINARY` and `TEXT` (of same collation) PKs so that they lead to the correct queries and ordering. -- For a MySQL source in `replication-only` mode, the [`--stagingSchema` replicator flag]({% link molt/molt-replicator.md %}#flags) can now be used to resume streaming replication after being interrupted. Otherwise, the [`--defaultGTIDSet` replicator flag]({% link molt/molt-replicator.md %}#mylogical-replication-flags) is used to start initial replication after a previous Fetch run in [`data-load`]({% link molt/molt-fetch.md %}#fetch-mode) mode, or as an override to the current replication stream. +- For a MySQL source in `replication-only` mode, the [`--stagingSchema` replicator flag]({% link molt/replicator-flags.md %}#staging-schema) can now be used to resume streaming replication after being interrupted. Otherwise, the [`--defaultGTIDSet` replicator flag]({% link molt/replicator-flags.md %}#default-gtid-set) is used to start initial replication after a previous Fetch run in [`data-load`]({% link molt/molt-fetch.md %}#fetch-mode) mode, or as an override to the current replication stream. ## October 29, 2024