-
Notifications
You must be signed in to change notification settings - Fork 1
Integrate/Kafka: Rework section. Add Python tutorial. #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning Rate limit exceeded@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 35 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (4)
WalkthroughAdds three Kafka documentation pages (attic list of managed offerings, Docker+Python ingestion tutorial, expanded Kafka index), small edits to the Kafka Connect page, and a docs/conf.py linkcheck_ignore entry. All changes are documentation-only; no runtime, API, or exported-public-entity changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Kafka as Kafka cluster
participant Consumer as Python consumer
participant Crate as CrateDB HTTP API
User->>Kafka: produce JSON messages to topic "sensors"
loop Polling
Consumer->>Kafka: poll()
Kafka-->>Consumer: messages (JSON)
Consumer->>Consumer: parse & batch (<=500)
end
alt Batch ready or shutdown
Consumer->>Crate: POST /_bulk (INSERT ... ON CONFLICT ...)
Crate-->>Consumer: 200 OK / errors
else Idle
Consumer->>Consumer: wait/backoff
end
note over Consumer,Crate: Batched idempotent upserts via HTTP bulk endpoint
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/integrate/kafka/kafka-connect.md (2)
60-67
: Fix invalid JSON payload (trailing comma).The example JSON has a trailing comma after the last field, which makes it invalid and will break copy/paste usage.
{ "sensor_id":101, "timestamp":"2022-06-12T19:00:00Z", "temperature":22.5, - "humidity":60, + "humidity":60 }
366-369
: Fix schema name used in SELECT examples (should be doc, not crate).
crate
is a user, not a schema. The default schema for user tables isdoc
. The current examples will fail as written.-SELECT * -FROM "crate"."sensortable" +SELECT * +FROM "doc"."sensortable" LIMIT 100;-cr> SELECT * FROM crate.sensortable LIMIT 5; +cr> SELECT * FROM doc.sensortable LIMIT 5;Also applies to: 374-385
🧹 Nitpick comments (12)
docs/integrate/kafka/kafka-connect.md (4)
231-237
: Use modern Docker Compose syntax consistently.Prefer
docker compose
(v2) overdocker-compose
(v1) and keep it consistent throughout.-docker-compose up -d +docker compose up -d-docker-compose ps +docker compose psAlso applies to: 243-245
200-209
: Pin CrateDB image to a specific version for reproducibility.Using
latest
risks drift and inconsistent results across runs. Pin to a known-good CrateDB release.- cratedb: - image: crate:latest + cratedb: + image: crate:5.5.0If a different version is intended, replace with the exact tag you’ve validated.
261-281
: Be explicit with schema and avoid ambiguous settings in the sink config.
- Explicitly qualify the table with the
doc
schema to match CrateDB defaults.- Consider removing
schema.pattern
for sink use; it’s more typical on source connectors and can be confusing here."config": { "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", "tasks.max": "1", "topics": "sensor-data-topic", "connection.url": "jdbc:postgresql://cratedb:5432/crate?user=crate&sslmode=disable", "dialect.name": "PostgreSqlDatabaseDialect", "auto.create": "true", "auto.evolve": "true", "insert.mode": "insert", "pk.mode": "none", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schemas.enable": "true", "value.converter.schema.registry.url": "http://schema-registry:8081", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "key.converter.schemas.enable": "false", - "table.name.format": "sensortable", - "schema.pattern": "doc" + "table.name.format": "doc.sensortable" }
108-113
: Pin the Kafka Connect JDBC plugin version.
latest
can introduce breaking changes. Pin to a validated version to keep the tutorial stable.- bash -c "confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:latest" + bash -c "confluent-hub install --no-prompt confluentinc/kafka-connect-jdbc:10.7.5"Replace with the actual version you’ve tested.
docs/integrate/kafka/attic.md (1)
11-21
: Double-check provider naming/links and clarify non-exhaustive list.Names and URLs for managed services can change. A brief note like “selection, not exhaustive” is already present; consider adding the month/year to set expectations.
Example addition after the dropdown intro: “List verified August 2025.”
docs/integrate/kafka/docker-python.md (5)
32-38
: Pin CrateDB image version for deterministic results.Avoid
latest
in tutorials.- cratedb: - image: crate:latest + cratedb: + image: crate:5.5.0
50-56
: Tighten wording and avoid bare URL.
- Say “Admin UI” instead of “cloud UI” here.
- Wrap bare URL in angle brackets to satisfy MD034.
-* CrateDB Admin UI: http://localhost:4200 +* CrateDB Admin UI: <http://localhost:4200> @@ -The easiest way to do this is through the CrateDB cloud UI at http://localhost:4200 and execute this using the console: +The easiest way is through the CrateDB Admin UI at <http://localhost:4200> and execute this using the console:
98-141
: Consumer: commit strategy and robustness.Current flow relies on default auto-commit timing and may reprocess on restart. Since writes are idempotent via ON CONFLICT, you can safely commit after a successful batch to reduce lag and speed recovery. Also log/handle HTTP exceptions.
from confluent_kafka import Consumer @@ c = Consumer({ "bootstrap.servers": "localhost:9092", "group.id": "demo", "auto.offset.reset": "earliest", + "enable.auto.commit": False, }) @@ def insert_batch(rows): @@ - r = requests.post(SQL_ENDPOINT, json=body, timeout=10) - r.raise_for_status() + r = requests.post(SQL_ENDPOINT, json=body, timeout=10) + r.raise_for_status() + return True @@ - if len(batch) >= 500: - insert_batch(batch) - batch.clear() + if len(batch) >= 500: + if insert_batch(batch): + c.commit(asynchronous=False) + batch.clear() finally: - if batch: insert_batch(batch) + if batch: + if insert_batch(batch): + c.commit(asynchronous=False) c.close()
1-2
: Anchor/name consistency with Guide conventions.Per prior restructuring, main anchors live in integration
index.md
and detailed tutorials often use a-learn
suffix. Consider renaming the ref tokafka-python-learn
(orkafka-docker-python-learn
) to stay consistent, and update the index toctree accordingly.We learned this convention from earlier feedback and can apply it here for navigational consistency.
Also applies to: 104-109
144-149
: Minor: suggest virtualenv note before pip install.A short note to use a virtual environment helps keep user environments clean.
Example addition above the commands: “Optionally create and activate a virtual environment (e.g., python -m venv .venv && source .venv/bin/activate) before installing packages.”
docs/integrate/kafka/index.md (2)
41-43
: Soften the “99% compatible” claim and clarify transaction note.Claims with precise percentages can mislead and go stale. Recommend “largely compatible” and explicitly mention notable differences (e.g., transactions unsupported) without quantifying.
-:::{important} -The Apache Kafka PostgreSQL connector is 99% compatible with CrateDB. CrateDB does not support transactions, but this only becomes notable when either ingestion to CrateDB is failing or there is some other connection issue.  -::: +:::{important} +The Apache Kafka PostgreSQL connector is largely compatible with CrateDB. Note that CrateDB does not support transactions; this generally matters only during failures or connection issues that require retries or rollbacks. +:::
84-85
: Remove unnecessary escape in “Crate.io”.The backslash before the dot isn’t needed.
-* **Managed CrateDB** – Crate\.io offers CrateDB Cloud, which can pair with either self-managed Kafka or managed Kafka services. This option reduces database operations to a minimum. +* **Managed CrateDB** – Crate.io offers CrateDB Cloud, which can pair with either self-managed Kafka or managed Kafka services. This option reduces database operations to a minimum.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
docs/integrate/kafka/attic.md
(1 hunks)docs/integrate/kafka/docker-python.md
(1 hunks)docs/integrate/kafka/index.md
(3 hunks)docs/integrate/kafka/kafka-connect.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: amotl
PR: crate/cratedb-guide#238
File: docs/integrate/azure-functions/learn.rst:1-1
Timestamp: 2025-08-09T16:19:43.966Z
Learning: In the CrateDB Guide documentation, main integration anchors (e.g., `azure-functions`) are intentionally placed in the `index.md` files of their respective integration folders, while detailed tutorials use the `-learn` suffix (e.g., `azure-functions-learn`) in their `learn.rst` or `learn.md` files. This is a deliberate architectural pattern for the documentation restructuring.
📚 Learning: 2025-08-09T16:19:43.966Z
Learnt from: amotl
PR: crate/cratedb-guide#238
File: docs/integrate/azure-functions/learn.rst:1-1
Timestamp: 2025-08-09T16:19:43.966Z
Learning: In the CrateDB Guide documentation, main integration anchors (e.g., `azure-functions`) are intentionally placed in the `index.md` files of their respective integration folders, while detailed tutorials use the `-learn` suffix (e.g., `azure-functions-learn`) in their `learn.rst` or `learn.md` files. This is a deliberate architectural pattern for the documentation restructuring.
Applied to files:
docs/integrate/kafka/index.md
🪛 LanguageTool
docs/integrate/kafka/attic.md
[grammar] ~4-~4: There might be a mistake here.
Context: --- orphan: true --- :::{rubric} About ::: [Apache Kafka] is an open-source di...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...ributed event streaming platform used by thousands of companies for high-performa...
(QB_NEW_EN)
[grammar] ~8-~8: There might be a mistake here.
Context: ...nce data pipelines, streaming analytics, data integration, and mission-critical a...
(QB_NEW_EN)
[grammar] ~11-~11: There might be a mistake here.
Context: ...tions. :::{dropdown} Managed Kafka Several companies provide managed Kafka ...
(QB_NEW_EN)
[grammar] ~12-~12: There might be a mistake here.
Context: ...he [overview of managed Kafka offerings] for a more complete list). - [Aiven for...
(QB_NEW_EN)
[grammar] ~15-~15: There might be a mistake here.
Context: ...mplete list). - [Aiven for Apache Kafka] - [Amazon Managed Streaming for Apache Kafk...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ...Managed Streaming for Apache Kafka (MSK)] - [Apache Kafka on Azure] - [Azure Event Hu...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...he Kafka (MSK)] - [Apache Kafka on Azure] - [Azure Event Hubs for Apache Kafka] - [Co...
(QB_NEW_EN)
[grammar] ~18-~18: There might be a mistake here.
Context: ...re] - [Azure Event Hubs for Apache Kafka] - [Confluent Cloud] - [DoubleCloud Managed ...
(QB_NEW_EN)
[grammar] ~19-~19: There might be a mistake here.
Context: ...ubs for Apache Kafka] - [Confluent Cloud] - [DoubleCloud Managed Service for Apache K...
(QB_NEW_EN)
docs/integrate/kafka/docker-python.md
[grammar] ~1-~1: There might be a mistake here.
Context: (kafka-docker-python)= # Using Kafka with Python This walkthroug...
(QB_NEW_EN)
[grammar] ~4-~4: There might be a mistake here.
Context: ...w to load data from a Kafka topic into a CrateDB table, using a Python consumer a...
(QB_NEW_EN)
[grammar] ~50-~50: There might be a mistake here.
Context: ... CrateDB Admin UI: http://localhost:4200 * Kafka broker (inside-compose hostname): ...
(QB_NEW_EN)
docs/integrate/kafka/index.md
[grammar] ~20-~20: There might be a mistake here.
Context: ...fix ``` :::{include} /_include/links.md ::: :::{div} sd-text-muted Apache Kafka...
(QB_NEW_EN)
[grammar] ~23-~23: There might be a mistake here.
Context: ...ude/links.md ::: :::{div} sd-text-muted Apache Kafka is a widely used open-sourc...
(QB_NEW_EN)
[grammar] ~24-~24: There might be a mistake here.
Context: ...uted event-store and streaming platform. ::: ## Overview [Apache Kafka] is a d...
(QB_NEW_EN)
[grammar] ~35-~35: There might be a mistake here.
Context: ...s, consumers, and CrateDB independently. * Near-real-time analytics – New events ...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ...n, exposed via SQL to standard BI tools. * Operational resilience – Use Kafka as ...
(QB_NEW_EN)
[grammar] ~73-~73: There might be a mistake here.
Context: ...ts directly from queries over live data. * **ETL / ELT landing zone for downstream an...
(QB_NEW_EN)
[grammar] ~82-~82: There might be a mistake here.
Context: ...anage scaling, upgrades, and monitoring. * Managed Kafka services – Use a provide...
(QB_NEW_EN)
[grammar] ~84-~84: There might be a mistake here.
Context: ...providers as well. * Managed CrateDB – Crate.io offers CrateDB Cloud, which can pair...
(QB_NEW_EN)
[grammar] ~91-~91: There might be a mistake here.
Context: ...vice_id) to keep ordering where needed. * Table modelling in CrateDB – Choose pr...
(QB_NEW_EN)
[grammar] ~92-~92: There might be a mistake here.
Context: ...formance - define shard count per table. * Upserts & deduplication – Include a st...
(QB_NEW_EN)
[grammar] ~93-~93: There might be a mistake here.
Context: ...to make writes idempotent when possible. * Batching & back-pressure – Tune sink b...
(QB_NEW_EN)
[grammar] ~94-~94: There might be a mistake here.
Context: ...gest capacity while keeping latency low. * Schema & types – Keep payloads consist...
(QB_NEW_EN)
[grammar] ~95-~95: There might be a mistake here.
Context: ...imestamps/time zones, numerics, arrays). * Retention & replay – Kafka retention d...
(QB_NEW_EN)
[grammar] ~96-~96: There might be a mistake here.
Context: ...plan storage and compaction accordingly. * Observability – Monitor producer lag, ...
(QB_NEW_EN)
[grammar] ~101-~101: There might be a mistake here.
Context: ...ry latency. ## Learn more ::::{grid} 2 :gutter: 2 :::{grid-item-card} Tutorial...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/integrate/kafka/docker-python.md
119-119: Bare URL used
(MD034, no-bare-urls)
125-125: Bare URL used
(MD034, no-bare-urls)
144-144: Link and image reference definitions should be needed
Unused link or image reference definition: "cratedb and apache kafka"
(MD053, link-image-reference-definitions)
🔇 Additional comments (4)
docs/integrate/kafka/kafka-connect.md (1)
2-2
: Title update aligns with content—good change.Renaming to “Using Confluent Kafka Connect” better reflects the page scope and matches the new index card. Please ensure any inbound links or references in the repo that still use the old title are redirected or updated.
docs/integrate/kafka/attic.md (1)
1-3
: Front-matter “orphan: true” is appropriate here.This keeps the page out of nav while remaining linkable. LGTM.
docs/integrate/kafka/index.md (2)
14-15
: Nice addition: CI badge for Ingestr + Kafka.This adds useful social proof and a quick status check. LGTM.
101-109
: Good cross-link: Docker+Python tutorial is clearly surfaced.Card copy mirrors the tutorial opening, and the
:link-type: ref
matches the page anchor. Once you finalize the tutorial ref name (see suggestion in that file), remember to update this link if it changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/integrate/kafka/docker-python.md (1)
19-21
: Fix Kafka advertised listeners to support host‐based Python clientsThe current Compose setup advertises
PLAINTEXT://kafka:9092
, which works inside Docker but breaks a host‐based consumer connecting tolocalhost:9092
(it’ll still receive “kafka:9092” in metadata and be unable to resolve that host). We need dual listeners (one internal, one host-accessible) or align advertised listeners with the host endpoint.Affected locations:
docs/integrate/kafka/docker-python.md
, lines 19–21: Kafka environment variablesdocs/integrate/kafka/docker-python.md
, lines 27–28: port mappings- The Python consumer snippet (bootstrap.servers) around line 104 in the same file
Proposed minimal diff:
services: kafka: environment: - - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093 - - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092 - - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT + - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,PLAINTEXT_HOST://:9094,CONTROLLER://:9093 + - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:9094 + - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,CONTROLLER:PLAINTEXT services: kafka: ports: - - "9092:9092" + - "9092:9092" + - "9094:9094" # expose the host listener for external clientsAlso, ensure the Python consumer’s bootstrap setting matches your choice:
- For dual listeners: keep
"bootstrap.servers": "localhost:9092"
(host listener on 9094 or remap to 9092)- Or run the consumer inside the
demo
network and use"bootstrap.servers": "kafka:9092"
(single listener)Update the code sample at line 104 accordingly.
🧹 Nitpick comments (14)
docs/integrate/kafka/attic.md (3)
11-21
: Clarify that Event Hubs is Kafka API–compatible, not Kafka itself.Readers often assume these are interchangeable. “Azure Event Hubs for Apache Kafka” exposes a Kafka-compatible endpoint but is not an Apache Kafka cluster; feature parity and operational semantics differ (e.g., partitions, offsets, retention, tooling). Add a one‑line qualifier in this list or a short note in the dropdown to prevent confusion.
Example minimal tweak:
- - [Azure Event Hubs for Apache Kafka] + - [Azure Event Hubs for Apache Kafka] (Kafka protocol–compatible service, not Apache Kafka)
12-14
: Add staleness disclaimer for third‑party managed-offerings link.The “overview of managed Kafka offerings” blog post may age quickly. Add a brief “not exhaustive / may be outdated” note to set expectations.
- Several companies provide managed Kafka services (see the [overview of managed Kafka offerings] - for a more complete list). + Several companies provide managed Kafka services (see the [overview of managed Kafka offerings] + for examples; note that offerings and features change frequently).
15-20
: Name accuracy: “Apache Kafka on Azure”.The link points to a Canonical marketplace listing. Consider renaming the bullet to reflect that it’s a marketplace/offered image rather than an Azure‑native managed Kafka.
- - [Apache Kafka on Azure] + - [Apache Kafka on Azure (Marketplace image)]docs/integrate/kafka/docker-python.md (5)
14-14
: Pin Docker image versions for reproducibility.Using
:latest
can introduce breaking changes over time. Pin known‑good tags (Kafka/CrateDB versions used in your CI or tested locally).- image: bitnami/kafka:latest + image: bitnami/kafka:3.7.0 @@ - image: crate:latest + image: crate:5.7.0If you prefer floating minors, document the expected minimum versions instead.
Also applies to: 32-32
55-55
: Terminology: “Admin UI” instead of “cloud UI”.
http://localhost:4200
is the local Admin UI. Replace “cloud UI” to avoid confusion with CrateDB Cloud.-The easiest way to do this is through the CrateDB cloud UI at `http://localhost:4200` and execute this using the console: +The easiest way to do this is through the CrateDB Admin UI at `http://localhost:4200` and execute this using the console:
121-140
: Optional: reduce frequent small flushes on idle polls.Flushing on every
poll()
timeout can send undersized batches during lulls. Consider a time‑based flush (e.g., every N seconds) or a largerpoll()
timeout to improve throughput and reduce HTTP overhead.
146-148
: Note on installing confluent-kafka.On some platforms
confluent-kafka
needs librdkafka or a matching manylinux wheel. Add a short note with troubleshooting or suggest a containerized run of the consumer for portability.Happy to add a minimal Dockerfile or a pip constraints snippet that’s known to work cross‑platform.
115-117
: Add minimum CrateDB version for ON CONFLICT upsertCrateDB’s
INSERT … ON CONFLICT … DO UPDATE SET … EXCLUDED.*
syntax was introduced in CrateDB 3.0.0 (released May 16, 2018) (cratedb.com). To ensure this example works as shown, please add a version note to the tutorial prerequisites, for example:• In docs/integrate/kafka/docker-python.md (around the prerequisites section)
- Prerequisites: - - CrateDB cluster + Prerequisites: + - CrateDB cluster (version 3.0.0 or later)docs/integrate/kafka/index.md (6)
34-38
: Avoid “exactly-once semantics” phrasing here.Idempotent upserts mitigate duplicates but don’t by themselves provide end‑to‑end exactly‑once across Kafka → CrateDB (especially since CrateDB doesn’t support transactions). Rephrase to set accurate expectations.
-* **Operational resilience** – Use Kafka as a durable buffer between CrateDB and data producers. Idempotent upserts (exactly-once semantics) reduce data-loss and duplication risks. +* **Operational resilience** – Use Kafka as a durable buffer between producers and CrateDB. Idempotent upserts reduce duplication risks and improve recovery from retries.
41-43
: Qualify PostgreSQL connector compatibility and link constraints.“Largely compatible” is true for many cases, but specifics (no transactions, type mappings, DDL features) matter. Add a short bullet list of caveats or link to a compatibility page. This reduces surprises when retries/rollbacks occur.
I can draft a concise “Compatibility notes with PostgreSQL connectors” section and cross‑link it here.
23-25
: Remove duplicated tagline to reduce repetition.This muted paragraph repeats the page description above. Consider removing for brevity.
-:::{div} sd-text-muted -Apache Kafka is a widely used open-source distributed event-store and streaming platform. -::: +<!-- Removed duplicated tagline -->
82-86
: Clarify managed services wording.Slightly tune the managed‑services bullet to note that some offerings are Kafka protocol–compatible (e.g., Event Hubs) rather than Kafka itself; link to the attic page if you want a longer list.
-* **Managed Kafka services** – Use a provider such as Confluent Cloud or AWS MSK to offload the operational heavy lifting of Kafka. You can still connect these managed clusters directly to a CrateDB deployment that you operate. CrateDB is also available on the major cloud providers as well. +* **Managed Kafka services** – Use a provider such as Confluent Cloud or AWS MSK to offload Kafka operations. Some services (e.g., Azure Event Hubs) provide Kafka‑compatible endpoints rather than Kafka itself. Any of these can connect to a CrateDB deployment you operate or to CrateDB Cloud.
132-136
: Optional: consider adding the managed offerings attic to the toctree (hidden).If you expect to reference it elsewhere, adding it here (even hidden) makes discovery in search easier.
```{toctree} :hidden: docker-python kafka-connect +attic
--- `6-8`: **Anchor and naming alignment (minor).** Per the doc restructuring pattern you’ve used elsewhere, tutorials sometimes follow a “-learn” convention. No action required, but consider whether anchors like `(kafka-docker-python)` should follow that pattern for consistency. Would you like me to propose a consistent anchor/name scheme across integration tutorials? </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro **💡 Knowledge Base configuration:** - MCP integration is disabled by default for public repositories - Jira integration is disabled by default for public repositories - Linear integration is disabled by default for public repositories You can enable these sources in your CodeRabbit configuration. <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between c4c1313917079286fe724edf3358ee1fb72f59fa and 9956eb608746d17ba8f6b21b1922223636132526. </details> <details> <summary>📒 Files selected for processing (4)</summary> * `docs/integrate/kafka/attic.md` (1 hunks) * `docs/integrate/kafka/docker-python.md` (1 hunks) * `docs/integrate/kafka/index.md` (3 hunks) * `docs/integrate/kafka/kafka-connect.md` (2 hunks) </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary> * docs/integrate/kafka/kafka-connect.md </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (1)</summary> <details> <summary>📚 Learning: 2025-08-09T16:19:43.966Z</summary>
Learnt from: amotl
PR: #238
File: docs/integrate/azure-functions/learn.rst:1-1
Timestamp: 2025-08-09T16:19:43.966Z
Learning: In the CrateDB Guide documentation, main integration anchors (e.g.,azure-functions
) are intentionally placed in theindex.md
files of their respective integration folders, while detailed tutorials use the-learn
suffix (e.g.,azure-functions-learn
) in theirlearn.rst
orlearn.md
files. This is a deliberate architectural pattern for the documentation restructuring.**Applied to files:** - `docs/integrate/kafka/index.md` </details> </details><details> <summary>🪛 LanguageTool</summary> <details> <summary>docs/integrate/kafka/index.md</summary> [grammar] ~20-~20: There might be a mistake here. Context: ...fix ``` :::{include} /_include/links.md ::: :::{div} sd-text-muted Apache Kafka... (QB_NEW_EN) --- [grammar] ~23-~23: There might be a mistake here. Context: ...ude/links.md ::: :::{div} sd-text-muted Apache Kafka is a widely used open-sourc... (QB_NEW_EN) --- [grammar] ~24-~24: There might be a mistake here. Context: ...uted event-store and streaming platform. ::: ## Overview [Apache Kafka] is a d... (QB_NEW_EN) --- [grammar] ~35-~35: There might be a mistake here. Context: ...s, consumers, and CrateDB independently. * **Near-real-time analytics** – New events ... (QB_NEW_EN) --- [grammar] ~36-~36: There might be a mistake here. Context: ...n, exposed via SQL to standard BI tools. * **Operational resilience** – Use Kafka as ... (QB_NEW_EN) --- [grammar] ~73-~73: There might be a mistake here. Context: ...ts directly from queries over live data. * **ETL / ELT landing zone for downstream an... (QB_NEW_EN) --- [grammar] ~82-~82: There might be a mistake here. Context: ...anage scaling, upgrades, and monitoring. * **Managed Kafka services** – Use a provide... (QB_NEW_EN) --- [grammar] ~84-~84: There might be a mistake here. Context: ...providers as well. * **Managed CrateDB** – Crate\.io offers CrateDB Cloud, which can pair... (QB_NEW_EN) --- [grammar] ~91-~91: There might be a mistake here. Context: ...vice\_id) to keep ordering where needed. * **Table modelling in CrateDB** – Choose pr... (QB_NEW_EN) --- [grammar] ~92-~92: There might be a mistake here. Context: ...formance - define shard count per table. * **Upserts & deduplication** – Include a st... (QB_NEW_EN) --- [grammar] ~93-~93: There might be a mistake here. Context: ...to make writes idempotent when possible. * **Batching & back-pressure** – Tune sink b... (QB_NEW_EN) --- [grammar] ~94-~94: There might be a mistake here. Context: ...gest capacity while keeping latency low. * **Schema & types** – Keep payloads consist... (QB_NEW_EN) --- [grammar] ~95-~95: There might be a mistake here. Context: ...imestamps/time zones, numerics, arrays). * **Retention & replay** – Kafka retention d... (QB_NEW_EN) --- [grammar] ~96-~96: There might be a mistake here. Context: ...plan storage and compaction accordingly. * **Observability** – Monitor producer lag, ... (QB_NEW_EN) --- [grammar] ~101-~101: There might be a mistake here. Context: ...ry latency. ## Learn more ::::{grid} 2 :gutter: 2 :::{grid-item-card} Tutorial... (QB_NEW_EN) </details> <details> <summary>docs/integrate/kafka/attic.md</summary> [grammar] ~4-~4: There might be a mistake here. Context: --- orphan: true --- :::{rubric} About ::: [Apache Kafka] is an open-source di... (QB_NEW_EN) --- [grammar] ~7-~7: There might be a mistake here. Context: ...ributed event streaming platform used by thousands of companies for high-performa... (QB_NEW_EN) --- [grammar] ~8-~8: There might be a mistake here. Context: ...nce data pipelines, streaming analytics, data integration, and mission-critical a... (QB_NEW_EN) --- [grammar] ~11-~11: There might be a mistake here. Context: ...tions. :::{dropdown} **Managed Kafka** Several companies provide managed Kafka ... (QB_NEW_EN) --- [grammar] ~12-~12: There might be a mistake here. Context: ...he [overview of managed Kafka offerings] for a more complete list). - [Aiven for... (QB_NEW_EN) --- [grammar] ~15-~15: There might be a mistake here. Context: ...mplete list). - [Aiven for Apache Kafka] - [Amazon Managed Streaming for Apache Kafk... (QB_NEW_EN) --- [grammar] ~16-~16: There might be a mistake here. Context: ...Managed Streaming for Apache Kafka (MSK)] - [Apache Kafka on Azure] - [Azure Event Hu... (QB_NEW_EN) --- [grammar] ~17-~17: There might be a mistake here. Context: ...he Kafka (MSK)] - [Apache Kafka on Azure] - [Azure Event Hubs for Apache Kafka] - [Co... (QB_NEW_EN) --- [grammar] ~18-~18: There might be a mistake here. Context: ...re] - [Azure Event Hubs for Apache Kafka] - [Confluent Cloud] - [DoubleCloud Managed ... (QB_NEW_EN) --- [grammar] ~19-~19: There might be a mistake here. Context: ...ubs for Apache Kafka] - [Confluent Cloud] - [DoubleCloud Managed Service for Apache K... (QB_NEW_EN) </details> <details> <summary>docs/integrate/kafka/docker-python.md</summary> [grammar] ~1-~1: There might be a mistake here. Context: (kafka-docker-python)= # Using Kafka with Python This walkthroug... (QB_NEW_EN) --- [grammar] ~4-~4: There might be a mistake here. Context: ...w to load data from a Kafka topic into a CrateDB table, using a Python consumer a... (QB_NEW_EN) --- [grammar] ~50-~50: There might be a mistake here. Context: ... CrateDB and Kafka * CrateDB Admin UI: `http://localhost:4200` * Kafka broker (inside-compose hostname): ... (QB_NEW_EN) </details> </details> </details> </details> <details> <summary>🔇 Additional comments (2)</summary><blockquote> <details> <summary>docs/integrate/kafka/docker-python.md (1)</summary><blockquote> `59-59`: **Clarify TIMESTAMPTZ alias support (CrateDB ≥ 4.0)** CrateDB recognizes `TIMESTAMPTZ` as an alias for `TIMESTAMP WITH TIME ZONE` since version 4.0 (though it was deprecated in 4.x and may change in future releases) ([cratedb.com](https://cratedb.com/docs/crate/reference/en/4.8/general/ddl/data-types.html?utm_source=chatgpt.com)), and continues to do so in 5.x ([cratedb.com](https://cratedb.com/docs/crate/reference/en/5.9/general/ddl/data-types.html?utm_source=chatgpt.com)). • File: docs/integrate/kafka/docker-python.md (line 59) • Recommend using the explicit type for portability, or, if you keep the alias, note the minimum version. ```diff - device_id TEXT, ts TIMESTAMPTZ, + device_id TEXT, ts TIMESTAMP WITH TIME ZONE,
Optional note if retaining
TIMESTAMPTZ
:Note: Requires CrateDB ≥ 4.0 for
TIMESTAMPTZ
support.docs/integrate/kafka/index.md (1)
101-110
: Nice cross-link to the Python tutorial.The Learn card aligns with the new tutorial and uses consistent wording with the tutorial’s intro. Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/conf.py (1)
39-41
: Consider removing the old repo entry or consolidating both patternsYou now ignore both crate-workbench and crate URLs. If the intention was to replace the former, drop the old entry; otherwise, consider a single regex to cover both for maintainability.
Option A — remove the outdated entry:
- r"https://github.com/crate-workbench/cratedb-toolkit/actions/runs/.*", r"https://github.com/crate/cratedb-toolkit/actions/runs/.*",
Option B — consolidate into one pattern:
- r"https://github.com/crate-workbench/cratedb-toolkit/actions/runs/.*", - r"https://github.com/crate/cratedb-toolkit/actions/runs/.*", + r"https://github.com/crate(?:-workbench)?/cratedb-toolkit/actions/runs/.*",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/conf.py
(1 hunks)
|
||
### Create a demo table in CrateDB | ||
|
||
The easiest way to do this is through the CrateDB cloud UI at `http://localhost:4200` and execute this using the console: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest way to do this is through the CrateDB cloud UI at `http://localhost:4200` and execute this using the console: | |
The easiest way to do this is through the CrateDB Admin UI at `http://localhost:4200` and execute this using the console: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All your suggestions have been implemented with 15e80bf, thanks!
); | ||
``` | ||
|
||
But this can also be done using curl: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this can also be done using curl: | |
But this can also be done using `curl`: |
|
||
### Create a Kafka topic and send a couple of messages | ||
|
||
This can be done in several ways, but we can use **docker-exec** in this way: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done in several ways, but we can use **docker-exec** in this way: | |
To create a Kafka topic can be done in several ways, but we can use **docker-exec** in this way: |
docs/integrate/kafka/index.md
Outdated
|
||
For more advanced pipelines, you can process events while they’re still in Kafka before they ever reach CrateDB. Frameworks like Flink, Kafka Streams, or Spark let you enrich records, join multiple streams together, run aggregations, or apply windowing functions in real time. | ||
|
||
The processed results are then written into CrateDB, where they’re immediately available for SQL queries and dashboards. This approach is powerful when raw events need to be cleaned, combined, or summarised before storage, though it adds more moving parts compared to a simple connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The processed results are then written into CrateDB, where they’re immediately available for SQL queries and dashboards. This approach is powerful when raw events need to be cleaned, combined, or summarised before storage, though it adds more moving parts compared to a simple connector. | |
The processed results are then written into CrateDB, where they’re immediately available for SQL queries and dashboards. This approach is powerful when raw events need to be cleaned, combined, or summarised before storing them, though it adds moving parts compared to a simple connector. |
docs/integrate/kafka/index.md
Outdated
|
||
How you run Kafka and CrateDB depends a lot on your environment and preferences. The most common approaches are: | ||
|
||
* **Containerised on-premise** – Run both Kafka and CrateDB on Docker or Kubernetes in your own data centre or private cloud. This gives you the most control, but also means you manage scaling, upgrades, and monitoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* **Containerised on-premise** – Run both Kafka and CrateDB on Docker or Kubernetes in your own data centre or private cloud. This gives you the most control, but also means you manage scaling, upgrades, and monitoring. | |
* **Containerised on-premise** – Run both Kafka and CrateDB on Docker or Kubernetes in your own data centre or private cloud. This gives you the most control, but also means you manage scaling, upgrading, and monitoring. |
About
Pulling new content about integrating with Apache Kafka. Thanks, @zolbatar!
Source: https://cratedb.gitbook.io/cratedb-docs/K3l1K4ZBSqj0AL16dbZi/drivers-and-integrations/data-sources/streaming-platforms/kafka-daryl
Preview: https://cratedb-guide--263.org.readthedocs.build/integrate/kafka/
Preview: https://cratedb-guide--263.org.readthedocs.build/integrate/kafka/docker-python.html
References