Skip to content

MOLT Oracle documentation; refactor MOLT tutorials #19918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
7 changes: 5 additions & 2 deletions src/current/_data/redirects.yml
Original file line number Diff line number Diff line change
Expand Up @@ -290,10 +290,13 @@
- 'migrate-from-serverless-to-dedicated.md'
versions: ['cockroachcloud']

- destination: molt/migrate-to-cockroachdb.md?filters=mysql
- destination: molt/migrate-data-load-and-replication.md?filters=oracle
sources: [':version/migrate-from-oracle.md']

- destination: molt/migrate-data-load-and-replication.md?filters=mysql
sources: [':version/migrate-from-mysql.md']

- destination: molt/migrate-to-cockroachdb.md
- destination: molt/migrate-data-load-and-replication.md
sources: [':version/migrate-from-postgres.md']

- destination: molt/migration-overview.md
Expand Down
38 changes: 0 additions & 38 deletions src/current/_includes/molt/fetch-data-load-and-replication.md

This file was deleted.

11 changes: 8 additions & 3 deletions src/current/_includes/molt/fetch-data-load-modes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
The following example migrates a single `employees` table. The table is exported to an Amazon S3 bucket and imported to CockroachDB using the [`IMPORT INTO`]({% link {{ site.current_cloud_version }}/import-into.md %}) statement, which is the [default MOLT Fetch mode]({% link molt/molt-fetch.md %}#data-movement).
MOLT Fetch can use either [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) to load data into CockroachDB:

- `IMPORT INTO` [takes the target CockroachDB tables offline]({% link {{ site.current_cloud_version }}/import-into.md %}#considerations) to maximize throughput. The tables come back online once the [import job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs) completes successfully. If you need to keep the target tables online, add the `--use-copy` flag to export data with [`COPY FROM`]({% link {{ site.current_cloud_version }}/copy.md %}) instead. For more details, refer to [Data movement]({% link molt/molt-fetch.md %}#data-movement).
| Statement | MOLT Fetch flag | Description |
|---------------|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `IMPORT INTO` | Default mode | <ul><li>Fastest option, but takes target tables offline during load.</li><li>Supports compression using the `--compression` flag to reduce storage used during export.</li><li>Executes as a distributed background job in CockroachDB, so is more efficient for large, wide, or heavily partitioned tables.</li></ul> |
| `COPY FROM` | `--use-copy` or `--direct-copy` | <ul><li>Slower than `IMPORT INTO`, but keeps target tables online and queryable during load.</li><li>Does not support compression.</li><li>Runs on the MOLT host and streams data row-by-row, which can increase memory usage and limit concurrency for large tables (many rows) or wide tables (many columns or large values like `JSONB`).</li></ul> |

- If you cannot move data to a public cloud, specify `--direct-copy` instead of `--bucket-path` in the `molt fetch` command. This flag instructs MOLT Fetch to use `COPY FROM` to move the source data directly to CockroachDB without an intermediate store. For more information, refer to [Direct copy]({% link molt/molt-fetch.md %}#direct-copy).
- Use `IMPORT INTO` (the default mode) for large datasets, wide rows, or partitioned tables.
- Use `--use-copy` when tables must remain online during data load.
- Use `--direct-copy` only when you cannot move data to a public cloud, or want to perform local testing without intermediate storage. In this case, no [intermediate file storage](#intermediate-file-storage) is used.
58 changes: 48 additions & 10 deletions src/current/_includes/molt/fetch-data-load-output.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,54 +4,92 @@

<section class="filter-content" markdown="1" data-scope="postgres">
~~~ json
{"level":"info","type":"summary","num_tables":1,"cdc_cursor":"0/43A1960","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"}
{"level":"info","type":"summary","num_tables":3,"cdc_cursor":"0/43A1960","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"}
~~~
</section>

<section class="filter-content" markdown="1" data-scope="mysql">
~~~ json
{"level":"info","type":"summary","num_tables":1,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-28","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"}
{"level":"info","type":"summary","num_tables":3,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-28","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"}
~~~
</section>

<section class="filter-content" markdown="1" data-scope="oracle">
~~~ json
{"level":"info","type":"summary","num_tables":3,"cdc_cursor":"2358840","time":"2025-02-10T14:28:11-05:00","message":"starting fetch"}
~~~
</section>

`data extraction` messages are written for each table that is exported to the location in `--bucket-path`:

~~~ json
{"level":"info","table":"public.employees","time":"2025-02-10T14:28:11-05:00","message":"data extraction phase starting"}
{"level":"info","table":"migration_schema.employees","time":"2025-02-10T14:28:11-05:00","message":"data extraction phase starting"}
~~~

~~~ json
{"level":"info","table":"public.employees","type":"summary","num_rows":200000,"export_duration_ms":1000,"export_duration":"000h 00m 01s","time":"2025-02-10T14:28:12-05:00","message":"data extraction from source complete"}
{"level":"info","table":"migration_schema.employees","type":"summary","num_rows":200000,"export_duration_ms":1000,"export_duration":"000h 00m 01s","time":"2025-02-10T14:28:12-05:00","message":"data extraction from source complete"}
~~~

`data import` messages are written for each table that is loaded into CockroachDB:

~~~ json
{"level":"info","table":"public.employees","time":"2025-02-10T14:28:12-05:00","message":"starting data import on target"}
{"level":"info","table":"migration_schema.employees","time":"2025-02-10T14:28:12-05:00","message":"starting data import on target"}
~~~

<section class="filter-content" markdown="1" data-scope="postgres">
~~~ json
{"level":"info","table":"public.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"0/43A1960","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"}
{"level":"info","table":"migration_schema.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"0/43A1960","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"}
~~~
</section>

<section class="filter-content" markdown="1" data-scope="mysql">
~~~ json
{"level":"info","table":"public.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"}
{"level":"info","table":"migration_schema.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"}
~~~
</section>

<section class="filter-content" markdown="1" data-scope="oracle">
~~~ json
{"level":"info","table":"migration_schema.employees","type":"summary","net_duration_ms":1899.748333,"net_duration":"000h 00m 01s","import_duration_ms":1160.523875,"import_duration":"000h 00m 01s","export_duration_ms":1000,"export_duration":"000h 00m 01s","num_rows":200000,"cdc_cursor":"2358840","time":"2025-02-10T14:28:13-05:00","message":"data import on target for table complete"}
~~~
</section>

A `fetch complete` message is written when the fetch task succeeds:

<section class="filter-content" markdown="1" data-scope="postgres">
~~~ json
{"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":1,"tables":["public.employees"],"cdc_cursor":"0/3F41E40","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"}
{"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":3,"tables":["migration_schema.employees","migration_schema.payments","migration_schema.payments"],"cdc_cursor":"0/3F41E40","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"}
~~~
</section>

<section class="filter-content" markdown="1" data-scope="mysql">
~~~ json
{"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":1,"tables":["public.employees"],"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"}
{"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":3,"tables":["migration_schema.employees","migration_schema.payments","migration_schema.payments"],"cdc_cursor":"4c658ae6-e8ad-11ef-8449-0242ac140006:1-29","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"}
~~~

{% if page.name != "migrate-bulk-load.md" %}
This message includes a `cdc_cursor` value. You must set the `--defaultGTIDSet` replication flag to this value when starting [`replication-only` mode](#replicate-changes-to-cockroachdb):

{% include_cached copy-clipboard.html %}
~~~
--defaultGTIDSet 4c658ae6-e8ad-11ef-8449-0242ac140006:1-29
~~~
{% endif %}
</section>

<section class="filter-content" markdown="1" data-scope="oracle">
~~~ json
{"level":"info","type":"summary","fetch_id":"f5cb422f-4bb4-4bbd-b2ae-08c4d00d1e7c","num_tables":3,"tables":["migration_schema.employees","migration_schema.payments","migration_schema.payments"],"cdc_cursor":"2358840","net_duration_ms":6752.847625,"net_duration":"000h 00m 06s","time":"2024-03-18T12:30:37-04:00","message":"fetch complete"}
~~~
</section>
</section>

{% if page.name == "migrate-data-load-replicate-only.md" %}
<section class="filter-content" markdown="1" data-scope="oracle">
The following message shows the appropriate values for the `--backfillFromSCN` and `--scn` replication flags to use when [starting`replication-only` mode](#replicate-changes-to-cockroachdb):

{% include_cached copy-clipboard.html %}
~~~
replication-only mode should include the following replicator flags: --backfillFromSCN 26685444 --scn 26685786
~~~
</section>
{% endif %}
14 changes: 14 additions & 0 deletions src/current/_includes/molt/fetch-intermediate-file-storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
MOLT Fetch can write intermediate files to either a cloud storage bucket or a local file server:

| Destination | MOLT Fetch flag(s) | Address and authentication |
|-------------------|---------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Cloud storage | `--bucket-path` | Specify a `s3://bucket/path`, `gs://bucket/path`, or `azure-blob://bucket/path` URL. <ul><li>**AWS S3**: Set `AWS_REGION`, `AWS_SECRET_ACCESS_KEY`, `AWS_ACCESS_KEY_ID` as environment variables or use `--use-implicit-auth`, `--assume-role`, and/or `--import-region`. Refer to [Cloud storage authentication](#cloud-storage-authentication).</li><li>**GCS**: Authenticate with Application Default Credentials or use `--use-implicit-auth`. Refer to [Cloud storage authentication](#cloud-storage-authentication).</li><li>**Azure Blob Storage**: Set `AZURE_ACCOUNT_NAME` and `AZURE_ACCOUNT_KEY` as environment variables or use `--use-implicit-auth`. Refer to [Cloud storage authentication](#cloud-storage-authentication).</li></ul> |
| Local file server | `--local-path`<br>`--local-path-listen-addr`<br>`--local-path-crdb-access-addr` | Write to `--local-path` on a local file server at `--local-path-listen-addr`; if the target CockroachDB cluster cannot reach this address, specify a publicly accessible address with `--local-path-crdb-access-addr`. No additional authentication is required. |

{{site.data.alerts.callout_success}}
Cloud storage is often preferred over a local file server, which may require significant disk space.
{{site.data.alerts.end}}

#### Cloud storage authentication

{% include molt/fetch-secure-cloud-storage.md %}
21 changes: 21 additions & 0 deletions src/current/_includes/molt/fetch-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
By default, MOLT Fetch exports [Prometheus](https://prometheus.io/) metrics at `http://127.0.0.1:3030/metrics`. You can override the address with `--metrics-listen-addr '{host}:{port}'`, where the endpoint will be `http://{host}:{port}/metrics`.

Cockroach Labs recommends monitoring the following metrics during data load:

| Metric Name | Description |
|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|
| `molt_fetch_num_tables` | Number of tables that will be moved from the source. |
| `molt_fetch_num_task_errors` | Number of errors encountered by the fetch task. |
| `molt_fetch_overall_duration` | Duration (in seconds) of the fetch task. |
| `molt_fetch_rows_exported` | Number of rows that have been exported from a table. For example:<br>`molt_fetch_rows_exported{table="public.users"}` |
| `molt_fetch_rows_imported` | Number of rows that have been imported from a table. For example:<br>`molt_fetch_rows_imported{table="public.users"}` |
| `molt_fetch_table_export_duration_ms` | Duration (in milliseconds) of a table's export. For example:<br>`molt_fetch_table_export_duration_ms{table="public.users"}` |
| `molt_fetch_table_import_duration_ms` | Duration (in milliseconds) of a table's import. For example:<br>`molt_fetch_table_import_duration_ms{table="public.users"}` |

You can also use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view the preceding metrics.

{% if page.name != "migrate-bulk-load.md" %}
{{site.data.alerts.callout_info}}
Metrics from the `replicator` process are enabled by setting the `--metricsAddr` [replication flag](#replication-flags), and are served at `http://{host}:{port}/_/varz`. <section class="filter-content" markdown="1" data-scope="oracle">To view Oracle-specific metrics from `replicator`, import [this Grafana dashboard](https://replicator.cockroachdb.com/replicator_oracle_grafana_dashboard.json).</section>
{{site.data.alerts.end}}
{% endif %}
Loading
Loading