-
Notifications
You must be signed in to change notification settings - Fork 1
Integrate/PostgreSQL: Add section with starter tutorial #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds PostgreSQL documentation across the site: updates ETL docs to reference PostgreSQL, extends the integrations toctree, and introduces a new PostgreSQL integration landing page and tutorial describing ingestion from PostgreSQL into CrateDB. All changes are documentation-only. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PostgreSQL
participant ctk_ingest as CrateDB Toolkit (ctk-ingest)
participant CrateDB
User->>PostgreSQL: Create DB/table and insert sample rows
User->>ctk_ingest: Run load table (source=PostgreSQL URL, target=CrateDB URL)
ctk_ingest->>PostgreSQL: Read rows
ctk_ingest->>CrateDB: Write rows
User->>CrateDB: SELECT to verify data
CrateDB-->>User: Return query results
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Invoke the data transfer pipeline. | ||
```shell | ||
ctk load table \ | ||
"postgresql://postgres:postgres@postgresql/public?table=demo" \ | ||
--cluster-url="crate://crate:crate@cratedb:4200/testdrive/postgresql_demo" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discovered this step currently fails, so it needs investigations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate OCI image cratedb-toolkit-ingest
exists now, which includes all requirements suitable for invoking ingestr-based transfer pipelines effortlessly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/integrate/postgresql/tutorial.md (1)
80-83
: Call out known ingestion issue and provide a pointer to follow-up.This step is currently reported to fail in some environments. Recommend adding a short note with a link to the tracker and any workaround, so users aren’t blocked mid-tutorial.
Proposed insertion after the command:
ctk-ingest load table \ "postgresql://postgres:postgres@postgresql:5432/test?table=public.demo" \ --cluster-url="crate://crate:crate@cratedb:4200/doc/postgresql_demo" +``` + +```{note} +This step may currently fail in some environments; see cratedb-toolkit issue 507 for status and workarounds: +https://github.com/crate/cratedb-toolkit/issues/507 +```
🧹 Nitpick comments (4)
docs/integrate/postgresql/tutorial.md (4)
29-31
: Flag insecure auth as demo-only.Consider adding a brief note that POSTGRES_HOST_AUTH_METHOD=trust is intentionally insecure and for local demos only.
Proposed addition:
docker run --rm --name=postgresql --network=cratedb-demo \ --publish=5433:5432 --env "POSTGRES_HOST_AUTH_METHOD=trust" \ docker.io/postgres postgres -c log_statement=all +``` + +```{warning} +Using `POSTGRES_HOST_AUTH_METHOD=trust` disables password checks. Use only for local demos. +```
41-45
: Consider pinning container image tags for reproducibility.Using floating tags can cause tutorials to break when images change. Pin to a known-good tag or document the expected minimum version.
Example:
-alias crash="docker run --rm -it --network=cratedb-demo ghcr.io/crate/cratedb-toolkit crash" -alias ctk-ingest="docker run --rm -i --network=cratedb-demo ghcr.io/crate/cratedb-toolkit-ingest ctk" +alias crash="docker run --rm -it --network=cratedb-demo ghcr.io/crate/cratedb-toolkit:latest crash" +alias ctk-ingest="docker run --rm -i --network=cratedb-demo ghcr.io/crate/cratedb-toolkit-ingest:latest ctk"
44-45
: Optional: make psql alias interactive-friendly.Using -it improves interactive usage. The heredoc in the example still works with -t.
-alias psql="docker run --rm -i --network=cratedb-demo docker.io/postgres psql" +alias psql="docker run --rm -it --network=cratedb-demo docker.io/postgres psql"
17-18
: Make the network creation step idempotent.If the network already exists, the command errors. Small tweak to keep copy/paste flows smooth.
-docker network create cratedb-demo +docker network create cratedb-demo 2>/dev/null || true
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
docs/ingest/etl/index.md
(2 hunks)docs/integrate/index.md
(1 hunks)docs/integrate/postgresql/index.md
(1 hunks)docs/integrate/postgresql/tutorial.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-15T18:56:11.233Z
Learnt from: amotl
PR: crate/cratedb-guide#255
File: docs/integrate/influxdb/tutorial.md:120-123
Timestamp: 2025-08-15T18:56:11.233Z
Learning: In CrateDB integration tutorials, when using `ctk load table` with `--cluster-url="crate://userhost:port/schema/table"`, the verification query should reference the target schema.table combination specified in the cluster URL, not derive it from the source system's naming. For example, if loading from InfluxDB bucket/measurement `testdrive/demo` to CrateDB `doc/testdrive`, the verification query should be `SELECT * FROM doc.testdrive`.
Applied to files:
docs/integrate/postgresql/tutorial.md
🪛 LanguageTool
docs/ingest/etl/index.md
[grammar] ~244-~244: There might be a mistake here.
Context: ... - {ref}nifi
- {ref}node-red
- {ref}postgresql
- {ref}risingwave
- {ref}sql-server
- ...
(QB_NEW_EN)
docs/integrate/postgresql/index.md
[grammar] ~1-~1: There might be a mistake here.
Context: (postgresql)= # PostgreSQL ```{div} .float-right [![pos...
(QB_NEW_EN)
[grammar] ~10-~10: There might be a mistake here.
Context: ...``{div} .clearfix ``` :::{rubric} About ::: [PostgreSQL] is the world's most ad...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: .../postgresql_demo" ``` :::{rubric} Learn ::: ::::{grid} :::{grid-item-card} Tut...
(QB_NEW_EN)
[grammar] ~38-~38: There might be a mistake here.
Context: ...rateDB Toolkit. ::: :::: :::{toctree} :maxdepth: 1 :hidden: Tutorial <tutorial...
(QB_NEW_EN)
[grammar] ~39-~39: There might be a mistake here.
Context: ...t. ::: :::: :::{toctree} :maxdepth: 1 :hidden: Tutorial ::: [Post...
(QB_NEW_EN)
[grammar] ~40-~40: There might be a mistake here.
Context: ...::: :::{toctree} :maxdepth: 1 :hidden: Tutorial ::: [PostgreSQL]: ...
(QB_NEW_EN)
[grammar] ~41-~41: There might be a mistake here.
Context: ...maxdepth: 1 :hidden: Tutorial ::: [PostgreSQL]: https://www.postgres...
(QB_NEW_EN)
docs/integrate/postgresql/tutorial.md
[grammar] ~1-~1: There might be a mistake here.
Context: (postgresql-tutorial)= # Load data from PostgreSQL into CrateDB ...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...y. The data transfer is supported by the {ref}`CrateDB Toolkit Ingestr I/O <ctk:i...
(QB_NEW_EN)
[grammar] ~34-~34: There might be a mistake here.
Context: ...ateDB Toolkit, and the PostgreSQL client programs. ::::{tab-set} :::{tab-item} ...
(QB_NEW_EN)
[grammar] ~39-~39: There might be a mistake here.
Context: ...{tab-set} :::{tab-item} Linux and macOS To make the settings persistent, add the...
(QB_NEW_EN)
[grammar] ~46-~46: There might be a mistake here.
Context: ...db-demo docker.io/postgres psql" ``` ::: :::{tab-item} Windows PowerShell To make...
(QB_NEW_EN)
[grammar] ~47-~47: There might be a mistake here.
Context: ...``` ::: :::{tab-item} Windows PowerShell To make the settings persistent, add the...
(QB_NEW_EN)
[grammar] ~54-~54: There might be a mistake here.
Context: ... docker.io/postgres psql @Args } ::: :::{tab-item} Windows Command
shell d...
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (5)
docs/integrate/postgresql/tutorial.md (1)
85-88
: Verification query target matches cluster-url schema.table.Good: SELECT targets doc.postgresql_demo, consistent with the cluster URL path. This aligns with prior integration tutorial conventions.
docs/integrate/index.md (1)
55-55
: Add PostgreSQL to integrations toctree — looks good.Entry is correctly referenced and aligns with the new integration section.
docs/ingest/etl/index.md (2)
124-127
: Nice addition of PostgreSQL in the Databases category.Description and ref anchor usage look correct.
244-244
: Alphabetical list updated correctly.Placement after node-red and before risingwave is correct.
docs/integrate/postgresql/index.md (1)
1-6
: Intro section and logo block read well.Anchor, title, and branding look correct.
The goal is to present concise walkthroughs without many bells and whistles, which get to the point of getting you started quickly. Use the canonical template to present another data nozzle based on the CTK Ingestr I/O subsystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
docs/integrate/postgresql/tutorial.md (4)
71-80
: Make the PostgreSQL setup idempotent and fail-fast.Re-running the tutorial currently errors on
CREATE DATABASE test;
. Also, making psql stop on the first error improves ergonomics.Suggested tweaks:
-psql "postgresql://postgres:postgres@postgresql:5432/" <<SQL +psql -v ON_ERROR_STOP=1 "postgresql://postgres:postgres@postgresql:5432/" <<SQL -DROP DATABASE IF EXISTS test; CREATE DATABASE test; \connect test; CREATE TABLE IF NOT EXISTS demo (id BIGINT, data JSONB); INSERT INTO demo (id, data) VALUES (1, '{"temperature": 42.84, "humidity": 83.1}'); INSERT INTO demo (id, data) VALUES (2, '{"temperature": 84.84, "humidity": 56.99}'); SQL
22-25
: Optional: Pin container image tags for reproducibility.Using floating tags can change behavior over time. Consider pinning versions (examples shown; adjust to your desired versions).
- docker.io/crate -Cdiscovery.type=single-node + docker.io/crate:5.7 -Cdiscovery.type=single-node- docker.io/postgres postgres -c log_statement=all + docker.io/postgres:16 postgres -c log_statement=allAlso applies to: 29-32
30-31
: Security note: trust authentication is for local demos only.
POSTGRES_HOST_AUTH_METHOD=trust
disables password checks. Consider adding a brief warning not to use this in production.
22-24
: Minor UX note: heap size might be high for small laptops.
CRATE_HEAP_SIZE=2g
can be heavy on constrained systems. Consider noting it can be reduced (e.g.,1g
) if necessary.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
docs/ingest/etl/index.md
(2 hunks)docs/integrate/index.md
(1 hunks)docs/integrate/postgresql/index.md
(1 hunks)docs/integrate/postgresql/tutorial.md
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/integrate/index.md
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-15T18:56:11.233Z
Learnt from: amotl
PR: crate/cratedb-guide#255
File: docs/integrate/influxdb/tutorial.md:120-123
Timestamp: 2025-08-15T18:56:11.233Z
Learning: In CrateDB integration tutorials, when using `ctk load table` with `--cluster-url="crate://userhost:port/schema/table"`, the verification query should reference the target schema.table combination specified in the cluster URL, not derive it from the source system's naming. For example, if loading from InfluxDB bucket/measurement `testdrive/demo` to CrateDB `doc/testdrive`, the verification query should be `SELECT * FROM doc.testdrive`.
Applied to files:
docs/integrate/postgresql/index.md
docs/integrate/postgresql/tutorial.md
📚 Learning: 2025-08-09T16:19:43.966Z
Learnt from: amotl
PR: crate/cratedb-guide#238
File: docs/integrate/azure-functions/learn.rst:1-1
Timestamp: 2025-08-09T16:19:43.966Z
Learning: In the CrateDB Guide documentation, main integration anchors (e.g., `azure-functions`) are intentionally placed in the `index.md` files of their respective integration folders, while detailed tutorials use the `-learn` suffix (e.g., `azure-functions-learn`) in their `learn.rst` or `learn.md` files. This is a deliberate architectural pattern for the documentation restructuring.
Applied to files:
docs/integrate/postgresql/index.md
🪛 LanguageTool
docs/ingest/etl/index.md
[grammar] ~244-~244: There might be a mistake here.
Context: ... - {ref}nifi
- {ref}node-red
- {ref}postgresql
- {ref}risingwave
- {ref}sql-server
- ...
(QB_NEW_EN)
docs/integrate/postgresql/index.md
[grammar] ~1-~1: There might be a mistake here.
Context: (postgresql)= # PostgreSQL ```{div} .float-right [![pos...
(QB_NEW_EN)
[grammar] ~10-~10: There might be a mistake here.
Context: ...``{div} .clearfix ``` :::{rubric} About ::: [PostgreSQL] is the world's most ad...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: .../postgresql_demo" ``` :::{rubric} Learn ::: ::::{grid} :::{grid-item-card} Tut...
(QB_NEW_EN)
[grammar] ~38-~38: There might be a mistake here.
Context: ...rateDB Toolkit. ::: :::: :::{toctree} :maxdepth: 1 :hidden: Tutorial <tutorial...
(QB_NEW_EN)
[grammar] ~39-~39: There might be a mistake here.
Context: ...t. ::: :::: :::{toctree} :maxdepth: 1 :hidden: Tutorial ::: [Post...
(QB_NEW_EN)
[grammar] ~40-~40: There might be a mistake here.
Context: ...::: :::{toctree} :maxdepth: 1 :hidden: Tutorial ::: [PostgreSQL]: ...
(QB_NEW_EN)
[grammar] ~41-~41: There might be a mistake here.
Context: ...maxdepth: 1 :hidden: Tutorial ::: [PostgreSQL]: https://www.postgres...
(QB_NEW_EN)
docs/integrate/postgresql/tutorial.md
[grammar] ~1-~1: There might be a mistake here.
Context: (postgresql-tutorial)= # Load data from PostgreSQL into CrateDB ...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...y. The data transfer is supported by the {ref}`CrateDB Toolkit Ingestr I/O <ctk:i...
(QB_NEW_EN)
[grammar] ~38-~38: There might be a mistake here.
Context: ...ateDB Toolkit, and the PostgreSQL client programs. ::::{tab-set} :::{tab-item} ...
(QB_NEW_EN)
[grammar] ~43-~43: There might be a mistake here.
Context: ...{tab-set} :::{tab-item} Linux and macOS To make the settings persistent, add the...
(QB_NEW_EN)
[grammar] ~50-~50: There might be a mistake here.
Context: ...db-demo docker.io/postgres psql" ``` ::: :::{tab-item} Windows PowerShell To make...
(QB_NEW_EN)
[grammar] ~51-~51: There might be a mistake here.
Context: ...``` ::: :::{tab-item} Windows PowerShell To make the settings persistent, add the...
(QB_NEW_EN)
[grammar] ~58-~58: There might be a mistake here.
Context: ... docker.io/postgres psql @Args } ::: :::{tab-item} Windows Command
shell d...
(QB_NEW_EN)
🔇 Additional comments (5)
docs/integrate/postgresql/index.md (2)
18-22
: Synopsis command looks correct and consistent with existing integrations.The uvx invocation without the ctk subcommand aligns with the established synopsis pattern used across other integration index pages. No change needed.
29-33
: Good cross-linking to the tutorial.The Learn card correctly links to the tutorial ref and provides a concise description. This matches the documentation structure pattern used elsewhere.
docs/ingest/etl/index.md (2)
124-127
: Solid addition of PostgreSQL in the Databases category.
- Placement is alphabetically correct (between MySQL and SQL Server).
- Description is consistent with the PostgreSQL integration page.
244-244
: Alphabetical list: placement is correct.
postgresql
appears in the right spot betweennode-red
andrisingwave
.docs/integrate/postgresql/tutorial.md (1)
84-87
: End-to-end flow and verification look correct.
- The ingestion command uses the dedicated ctk-ingest image and resolves services via the Docker network, which is robust.
- The verification query correctly targets
doc.postgresql_demo
, matching the--cluster-url
destination schema.table.Also applies to: 91-92
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/integrate/postgresql/tutorial.md (1)
22-25
: Run containers in background to avoid blocking the terminal.Both docker run commands will occupy the shell; use -d for a smoother single-terminal flow (or instruct opening multiple terminals).
Apply:
-docker run --rm --name=cratedb --network=cratedb-demo \ +docker run -d --rm --name=cratedb --network=cratedb-demo \ --publish=4200:4200 --publish=5432:5432 \ docker.io/crate -Cdiscovery.type=single-node-docker run --rm --name=postgresql --network=cratedb-demo \ +docker run -d --rm --name=postgresql --network=cratedb-demo \ --publish=6432:5432 --env "POSTGRES_HOST_AUTH_METHOD=trust" \ docker.io/postgres postgres -c log_statement=allAlso applies to: 29-32
🧹 Nitpick comments (3)
docs/integrate/postgresql/tutorial.md (3)
17-18
: Make the network creation step idempotent.Re-running the tutorial currently fails if the network exists. Make it tolerant.
Apply:
-docker network create cratedb-demo +docker network create cratedb-demo || true
50-53
: Enable TTY for psql alias to support interactive use.Your later heredoc works with -i, but interactive sessions benefit from -t as well.
Apply:
-alias psql="docker run --rm -i --network=cratedb-demo docker.io/postgres psql" +alias psql="docker run --rm -it --network=cratedb-demo docker.io/postgres psql"
77-84
: Simplify psql URL when using trust auth.Since POSTGRES_HOST_AUTH_METHOD=trust disables password checks, the password segment is unnecessary and can be confusing.
Apply:
-psql "postgresql://postgres:postgres@postgresql:5432/" <<SQL +psql "postgresql://postgres@postgresql:5432/" <<SQL
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/integrate/postgresql/tutorial.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-15T18:56:11.233Z
Learnt from: amotl
PR: crate/cratedb-guide#255
File: docs/integrate/influxdb/tutorial.md:120-123
Timestamp: 2025-08-15T18:56:11.233Z
Learning: In CrateDB integration tutorials, when using `ctk load table` with `--cluster-url="crate://userhost:port/schema/table"`, the verification query should reference the target schema.table combination specified in the cluster URL, not derive it from the source system's naming. For example, if loading from InfluxDB bucket/measurement `testdrive/demo` to CrateDB `doc/testdrive`, the verification query should be `SELECT * FROM doc.testdrive`.
Applied to files:
docs/integrate/postgresql/tutorial.md
🪛 LanguageTool
docs/integrate/postgresql/tutorial.md
[grammar] ~1-~1: There might be a mistake here.
Context: (postgresql-tutorial)= # Load data from PostgreSQL into CrateDB ...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...y. The data transfer is supported by the {ref}`CrateDB Toolkit Ingestr I/O <ctk:i...
(QB_NEW_EN)
[grammar] ~42-~42: There might be a mistake here.
Context: ...ateDB Toolkit, and the PostgreSQL client programs. ::::{tab-set} :::{tab-item} ...
(QB_NEW_EN)
[grammar] ~47-~47: There might be a mistake here.
Context: ...{tab-set} :::{tab-item} Linux and macOS To make the settings persistent, add the...
(QB_NEW_EN)
[grammar] ~54-~54: There might be a mistake here.
Context: ...db-demo docker.io/postgres psql" ``` ::: :::{tab-item} Windows PowerShell To make...
(QB_NEW_EN)
[grammar] ~55-~55: There might be a mistake here.
Context: ...``` ::: :::{tab-item} Windows PowerShell To make the settings persistent, add the...
(QB_NEW_EN)
[grammar] ~62-~62: There might be a mistake here.
Context: ... docker.io/postgres psql @Args } ::: :::{tab-item} Windows Command
shell d...
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (2)
docs/integrate/postgresql/tutorial.md (2)
88-91
: Confirm the ingestion step works with current CTK; add a short “Known issues” note if needed.There was a recent report that this step may fail (see cratedb-toolkit#507). Given you now use the dedicated cratedb-toolkit-ingest image, it might be resolved—please verify end-to-end.
If it still fails, consider adding a brief note below this block linking to the issue and a workaround (e.g., pinning a known-good image tag).
93-96
: Verification query correctly targets the intended schema.table.Good job aligning the SELECT with the cluster URL’s doc/postgresql_demo target; this matches the integration tutorial convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
About
There was no section about how to import data from PostgreSQL yet.
Details
The goal is to present concise walkthroughs without many bells and
whistles, which get to the point of getting you started quickly.
Use the canonical template to present another data nozzle based on
the CTK Ingestr I/O subsystem.
Preview
https://cratedb-guide--257.org.readthedocs.build/integrate/postgresql/
Backlog