LlamaIndex: Refactor Text-to-SQL synopsis. Add full tutorial. #316

amotl · 2025-09-16T07:40:59Z

About

Refactor Text-to-SQL walkthrough into "LlamaIndex" section. Also, add the full tutorial from the community forum.

Preview

https://cratedb-guide--316.org.readthedocs.build/integrate/llamaindex/

References

coderabbitai · 2025-09-16T07:41:05Z

Warning

Rate limit exceeded

@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 46 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between cbe650c and 2eb9674.

📒 Files selected for processing (4)

docs/integrate/llamaindex/index.md (2 hunks)
docs/integrate/llamaindex/synopsis.md (1 hunks)
docs/integrate/llamaindex/tutorial.md (1 hunks)
docs/start/query/ai-integration.md (2 hunks)

Walkthrough

Adds two new LlamaIndex documentation pages (synopsis and tutorial), refactors the LlamaIndex integration landing page into a tile/grid layout removing heavy inline examples, updates AI integration references to point to the new pages, and appends two external URL patterns to the docs linkcheck ignore list.

Changes

Cohort / File(s)	Summary of changes
Docs — LlamaIndex landing & content `docs/integrate/llamaindex/index.md`	Refactored landing to tile/grid layout; added About and Learn rubrics; removed previous Install/Synopsis runnable example; added tiles linking to new synopsis and tutorial and to GitHub code examples; updated tags and link types.
Docs — New synopsis page `docs/integrate/llamaindex/synopsis.md`	Added new Text-to-SQL synopsis: install notes, two-column walkthrough, full end-to-end code example (LLM provisioning, SQLDatabase, NLSQLTableQueryEngine) and links to examples.
Docs — New tutorial page `docs/integrate/llamaindex/tutorial.md`	Added comprehensive CrateDB + LlamaIndex tutorial demonstrating Azure OpenAI/OpenAI vs Ollama configuration, embedding model choices, CrateDB setup, engine creation, NLSQLTableQueryEngine usage, example queries and result/metadata handling.
Docs — AI integration overview `docs/start/query/ai-integration.md`	Added `llamaindex` and `pycaret` to Integrations; removed large inline Text-to-SQL example and replaced with concise link references to the new synopsis and tutorial; adjusted related-features links.
Docs config — linkcheck ignore `docs/conf.py`	Appended two new ignore patterns to `linkcheck_ignore`: `r"https://www.llama.com/."` (comment: `# 400 Client Error: Bad Request for url`) and `r"https://openai.com/index/gpt-4/."` (comment: `# 403 Client Error: Forbidden for url`).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App as App/Notebook
  participant LI as LlamaIndex
  participant LLM as LLM (OpenAI/Azure/Ollama)
  participant DB as CrateDB (SQLAlchemy)

  User->>App: Ask natural-language question
  App->>LI: Build NLSQLTableQueryEngine (Settings.llm, include_tables)
  App->>LLM: Configure/select provider (API key, endpoint, model)
  LI->>LLM: Generate SQL from NL query
  LI->>DB: Execute generated SQL
  DB-->>LI: Return rows/results
  LI-->>App: Return answer, sources, metadata (SQL)
  App-->>User: Display answer and provenance

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

new content

Suggested reviewers

hammerhead
kneth
bmunkholm

Poem

In tiles I hop and hop once more,
I tile the docs and open doors.
LlamaIndex sings SQL in rhyme,
CrateDB answers in record time.
Two links we skip — a tiny chore. 🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly and concisely summarizes the primary changes in the PR — refactoring the Text-to-SQL synopsis into a LlamaIndex section and adding a full tutorial — and aligns with the modified/added docs in the changeset.
Description Check	✅ Passed	The PR description succinctly states the intent to refactor the Text-to-SQL walkthrough into a LlamaIndex section and to add the full tutorial, includes a preview link and relevant issue references, and is directly related to the changeset.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

amotl · 2025-09-16T09:38:52Z

@coderabbitai review and please converge all spots to active voice.

coderabbitai · 2025-09-16T09:39:07Z

@amotl I'll review the PR and focus on converting all instances to active voice as you requested. Let me analyze the changes and provide suggestions for improving the voice and tone throughout the documentation.

🧠 Learnings used

Learnt from: amotl
PR: crate/cratedb-guide#0
File: :0-0
Timestamp: 2025-09-15T17:42:25.449Z
Learning: When reviewing documentation, use active voice instead of first-person or passive voice, and emit corresponding suggestions.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (13)

docs/conf.py (1)

74-76: Make the ignore pattern resilient to anchors/query strings.

OpenAI pages often append anchors (e.g., #section) or query params. Widen the regex so linkcheck ignores those variants too.
-    r"https://openai.com/index/gpt-4/",
+    r"https://openai.com/index/gpt-4/.*",

docs/integrate/llamaindex/synopsis.md (5)

63-70: Parameterize the Ollama model; default to a widely available one.

Hard‑coding “gemma3:1b” risks 404s on common installations. Prefer an env var with a sensible default.
-llm = Ollama(
-    base_url=os.getenv("OLLAMA_BASE_URL", "http://localhost:11434"),
-    model="gemma3:1b",
+llm = Ollama(
+    base_url=os.getenv("OLLAMA_BASE_URL", "http://localhost:11434"),
+    model=os.getenv("OLLAMA_MODEL", "llama3:8b-instruct"),
     temperature=0.0,
     request_timeout=120.0,
     keep_alive=-1,
 )
80-82: Avoid opening an unused connection.

create_engine is enough; the stray connect() leaks a connection in examples readers may copy/paste.
-database = sa.create_engine(os.getenv("CRATEDB_SQLALCHEMY_URL", "crate://crate@localhost:4200"))
-database.connect()
+database = sa.create_engine(os.getenv("CRATEDB_SQLALCHEMY_URL", "crate://crate@localhost:4200"))
12-18: Pin compatible minimum versions (prevent accidental downgrades).

Upper bounds alone allow too‑old versions. Add minimal versions matching current APIs used here.

Proposed (verify exact mins in your env):
-langchain-openai<0.3
-llama-index-embeddings-langchain<0.4
-llama-index-embeddings-openai<0.4
-llama-index-llms-azure-openai<0.4
-llama-index-llms-openai<0.4
+langchain-openai>=0.1.6,<0.3
+llama-index-embeddings-langchain>=0.3.0,<0.4
+llama-index-embeddings-openai>=0.2.0,<0.4
+llama-index-llms-azure-openai>=0.2.0,<0.4
+llama-index-llms-openai>=0.2.0,<0.4
 sqlalchemy-cratedb
41-47: Tighten phrasing to active voice.

Small copy tweak.
-Provision LLM using OpenAI model.
+Provision an LLM using an OpenAI model.
56-62: Use active voice here, too.
-Alternatively, provision LLM using self-hosted model.
+Provision an LLM using a self‑hosted model.

docs/start/query/ai-integration.md (1)

214-220: Active-voice polish.

Streamline the intro sentence.

-Text-to-SQL: Talk to your data using human language and contemporary large
-language models, optionally offline.
+Text‑to‑SQL lets you query data in natural language with contemporary large language models, optionally offline.

docs/integrate/llamaindex/index.md (2)

16-28: Clarify “About” and fix branding (“OpenAI”).

LlamaIndex doesn’t ship models; it integrates with them. Also prefer “Azure OpenAI”.

-[LlamaIndex] is a data framework for Large Language Models (LLMs). It comes with
-pre-trained models on massive public datasets such as GPT-4 or Llama 2, and
-provides an interface to external data sources allowing for natural language
-querying on your private data.
+[LlamaIndex] is a data framework for Large Language Models (LLMs). It integrates with
+providers and models such as GPT‑4 or Llama 3, and
+provides interfaces to external data sources for natural‑language querying of your private data.
 
-Azure Open AI Service is a fully managed service that runs on the Azure global
+Azure OpenAI Service is a fully managed service that runs on the Azure global
 infrastructure and allows developers to integrate OpenAI models into their
-applications. Through Azure Open AI API one can easily access a wide range of
+applications. Through the Azure OpenAI API, you can access a wide range of
 AI models in a scalable and reliable way.

55-66: Tighten duplicate phrasing in demo tile.

Combine overlapping sentences.

-- Connect your CrateDB data to an LLM using OpenAI or Azure OpenAI.
-- Text-to-SQL / Talk to your data: Query the database in human language; query CrateDB in plain English.
+- Connect CrateDB to an LLM via OpenAI or Azure OpenAI.
+- Text‑to‑SQL: Query CrateDB in natural language.

docs/integrate/llamaindex/tutorial.md (4)

6-11: Fix branding and shift to active voice.

-[LlamaIndex](https://www.llamaindex.ai/) is a data framework for Large Language Models (LLMs). It comes with pre-trained models on massive public datasets such as [GPT-4](https://openai.com/index/gpt-4/) or [Llama 4](https://www.llama.com/models/llama-4/) and provides an interface to external data sources allowing for natural language querying on your private data.
+[LlamaIndex](https://www.llamaindex.ai/) is a data framework for Large Language Models (LLMs). It integrates with models such as [GPT‑4](https://openai.com/index/gpt-4/) or [Llama 4](https://www.llama.com/models/llama-4/) and provides interfaces to external data sources for natural‑language querying of your private data.
 
-[Azure Open AI Service](https://azure.microsoft.com/en-us/products/ai-services/openai-service) is a fully managed service that runs on the Azure global infrastructure and allows developers to integrate OpenAI models into their applications. Through Azure Open AI API one can easily access a wide range of AI models in a scalable and reliable way.
+[Azure OpenAI Service](https://azure.microsoft.com/en-us/products/ai-services/openai-service) is a fully managed service on the Azure global infrastructure that lets developers integrate OpenAI models into applications. Through the Azure OpenAI API, you can access a wide range of AI models in a scalable and reliable way.
 
-In this tutorial, we will illustrate how to augment existing LLMs with data stored in CrateDB through the LlamaIndex framework and Azure Open AI Service. By doing this, you will be able to use the power of generative AI models with your own data in just a few lines of code.
+This tutorial shows how to augment LLMs with data stored in CrateDB using LlamaIndex and Azure OpenAI, enabling natural‑language queries over your data.

137-141: Avoid unused connect(); use the engine directly.

-engine_crate = sa.create_engine(os.getenv("CRATEDB_SQLALCHEMY_URL"))
-engine_crate.connect()
+engine_crate = sa.create_engine(os.getenv("CRATEDB_SQLALCHEMY_URL"))

156-164: Fix typo and tighten phrasing.

-Then use that to create am instance of `NLSQLTableQueryEngine`:
+Then create an instance of `NLSQLTableQueryEngine`:

190-199: Avoid time‑sensitive promises; keep it evergreen.

[ sugerst_nitpick_refactor -> note: correct tag is suggest_nitpick_refactor ]

-This tutorial is just the beginning. You can expect further resources, documentation, and tutorials related to CrateDB and generative AI from us. Also, stay tuned for the CrateDB 5.5 release: we will soon announce the support for the vector store and search, allowing you to implement similarity-based data retrieval efficiently.
+This tutorial is a starting point. Explore additional resources on CrateDB and generative AI as they become available.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bddc9d0 and eb124b6.

📒 Files selected for processing (5)

docs/conf.py (1 hunks)
docs/integrate/llamaindex/index.md (2 hunks)
docs/integrate/llamaindex/synopsis.md (1 hunks)
docs/integrate/llamaindex/tutorial.md (1 hunks)
docs/start/query/ai-integration.md (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build docs

🔇 Additional comments (5)

docs/start/query/ai-integration.md (2)

51-55: Good additions to Integrations.

New anchors for LlamaIndex and PyCaret are clear and consistent.

59-60: Cross‑links look correct.

References resolve to the new synopsis and tutorial pages.

Please build the docs locally to confirm sphinx refs resolve: make html && rg -n "undefined label" build/ (should output nothing).

docs/integrate/llamaindex/index.md (2)

43-53: LGTM — concise, active copy.

Synopsis tile reads well.

78-83: ToC looks right.

Hidden ToC with synopsis and tutorial is appropriate.

docs/integrate/llamaindex/tutorial.md (1)

85-121: Simplify configure_llm — don’t mutate openai globals; keep LlamaIndex AzureOpenAI engine

Read OPENAI_API_TYPE into a local variable, pick the wrapper, and pass credentials into the OpenAI/AzureOpenAI constructors instead of assigning to openai.api_type / openai.api_key / openai.azure_endpoint. For llama-index-llms-azure-openai, AzureOpenAI expects an engine (your Azure deployment name) — keep using engine, not model/deployment_name. (docs.llamaindex.ai)

File: docs/integrate/llamaindex/tutorial.md (lines 85–121)

Likely an incorrect or invalid review comment.

docs/integrate/llamaindex/synopsis.md

docs/integrate/llamaindex/tutorial.md

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

docs/integrate/llamaindex/tutorial.md (1)

17-23: Prerequisites: list required Python packages explicitly and update LlamaIndex docs URL.

This prevents import errors (LangChain embeddings, OpenAI/Azure clients, CrateDB dialect).

-* Recent version of LlamaIndex, please follow the [installation instructions](https://gpt-index.readthedocs.io/en/latest/getting_started/installation.html)
+* Recent version of LlamaIndex — see [installation instructions](https://docs.llamaindex.ai/en/stable/getting_started/installation/)
+* `openai`
+* `llama-index`
+* `langchain-openai` and `llama-index-embeddings-langchain`
 * `sqlalchemy-cratedb`
 * `SQLAlchemy` (if not pulled transitively)
 * Running instance of [CrateDB](https://console.cratedb.cloud/)
 * [Azure subscription](https://azure.microsoft.com/en-gb/free/cognitive-services/) and [Azure OpenAI resource](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal)

🧹 Nitpick comments (2)

docs/integrate/llamaindex/tutorial.md (1)

34-37: Fix markdown list indentation (MD005).

Align nested list items under step 2.

-2. This will open Azure AI Studio. Azure AI Studio enables developers to build, run, and deploy AI applications. Click on the *Create new deployment* button to deploy the following models:
-  1. **GPT-35-turbo** for text generation tasks
-  2. **text-embedding-ada-002** for generating embeddings
+2. This opens Azure AI Studio. Click *Create new deployment* and deploy:
+    1. A chat/completions model (e.g., **gpt-4o-mini**)
+    2. An embeddings model (e.g., **text-embedding-3-large**)

docs/integrate/llamaindex/index.md (1)

66-72: markdownlint MD034 (bare URL) in card link.

Either suppress MD034 for this line or move the URL into the card body as a named link.

Option A (suppress):

+% markdownlint-disable-next-line MD034
 :link: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index

Option B (move link into body, drop 🔗):

-:::{grid-item-card} LlamaIndex and CrateDB: Code Examples
-:link: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index
-:link-type: url
-NL2SQL with LlamaIndex: Querying CrateDB using natural language.
+:::{grid-item-card} LlamaIndex and CrateDB: Code Examples
+NL2SQL with LlamaIndex: Querying CrateDB using natural language. See
+[LlamaIndex and CrateDB: Code Examples].
 {tags-primary}`Runnable example`
 :::

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb124b6 and 0f4a3b8.

📒 Files selected for processing (5)

docs/conf.py (1 hunks)
docs/integrate/llamaindex/index.md (2 hunks)
docs/integrate/llamaindex/synopsis.md (1 hunks)
docs/integrate/llamaindex/tutorial.md (1 hunks)
docs/start/query/ai-integration.md (2 hunks)

✅ Files skipped from review due to trivial changes (1)

docs/integrate/llamaindex/synopsis.md

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

docs/integrate/llamaindex/index.md

67-67: Bare URL used

(MD034, no-bare-urls)

docs/integrate/llamaindex/tutorial.md

35-35: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2

(MD005, list-indent)

36-36: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2

(MD005, list-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build docs

🔇 Additional comments (21)

docs/integrate/llamaindex/tutorial.md (14)

13-13: Active voice: remove first-person phrasing.

[ suggest_nitpick_refactor ]

-If you want to run this in your own environment, we've provided all of the code and supporting resources that you'll need in the [`cratedb-examples`](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index) GitHub repository.
+The [`cratedb-examples`](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index) repository provides all required code and supporting resources to run this tutorial.

46-54: Active voice: remove “Let’s …”.

[ suggest_nitpick_refactor ]

-## Load time-series data to CrateDB
-
-Let’s now create the `time_series_data` table in CrateDB that contains time series data, where each row represents a data point with the following information:
+## Load time-series data into CrateDB
+
+Create the `time_series_data` table. Each row represents a data point with:

64-66: Active voice and clarity.

[ suggest_nitpick_refactor ]

-Import a portion of the data we will use for learning and querying:
+Import sample data for learning and querying:

85-96: Tighten wording and fix product names.

[ suggest_nitpick_refactor ]

-Azure OpenAI resource differs slightly from the standard OpenAI resource as it requires the use of the embedding model, which we deployed in the previous step. The following code illustrates the setup of OpenAI API:
+Azure OpenAI differs slightly from the standard OpenAI setup because you must specify the embedding model deployed earlier. The following code configures the OpenAI API:

88-124: Avoid relying on openai.api_type; drive off an env var and set embedding model consistently.

Using a local api_type var avoids coupling to the OpenAI SDK’s module globals and keeps LlamaIndex wrappers isolated. Also set the embedding model for both branches.

[ suggest_recommended_refactor ]

 def configure_llm():
     """
-    Configure LLM. Use either vanilla Open AI, or Azure Open AI.
+    Configure the LLM. Use either OpenAI or Azure OpenAI.
     """
-
-    openai.api_type = os.getenv("OPENAI_API_TYPE")
-    openai.azure_endpoint = os.getenv("OPENAI_AZURE_ENDPOINT")
-    openai.api_version = os.getenv("OPENAI_AZURE_API_VERSION")
-    openai.api_key = os.getenv("OPENAI_API_KEY")
+    api_type = os.getenv("OPENAI_API_TYPE", "openai").lower()
 
-    if openai.api_type == "openai":
+    if api_type == "openai":
         llm = OpenAI(
             api_key=os.getenv("OPENAI_API_KEY"),
             temperature=0.0
         )
-    elif openai.api_type == "azure":
+    elif api_type == "azure":
         llm = AzureOpenAI(
             engine=os.getenv("LLM_INSTANCE"),
             azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"),
-            api_key = os.getenv("OPENAI_API_KEY"),
-            api_version = os.getenv("OPENAI_AZURE_API_VERSION"),
+            api_key=os.getenv("OPENAI_API_KEY"),
+            api_version=os.getenv("OPENAI_AZURE_API_VERSION"),
             temperature=0.0
         )
     else:
-        raise ValueError(f"Open AI API type not defined or invalid: {openai.api_type}")
+        raise ValueError(f"OpenAI API type not defined or invalid: {api_type}")
 
     Settings.llm = llm
-    if openai.api_type == "openai":
-        Settings.embed_model = LangchainEmbedding(OpenAIEmbeddings())
-    elif openai.api_type == "azure":
+    if api_type == "openai":
+        Settings.embed_model = LangchainEmbedding(
+            OpenAIEmbeddings(model=os.getenv("EMBEDDING_MODEL_INSTANCE", "text-embedding-3-small"))
+        )
+    elif api_type == "azure":
         Settings.embed_model = LangchainEmbedding(
             AzureOpenAIEmbeddings(
                 azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"),
                 model=os.getenv("EMBEDDING_MODEL_INSTANCE")
             )
         )

Please verify imports in the snippet (LlamaIndex wrappers and LangChain embeddings) align with the packages listed in Prerequisites.

130-133: Editorial: clarify Settings usage and fix minor spacing.

[ suggest_nitpick_refactor ]

-The code also initializes the LLM and embedding models. The value for `EMBEDDING_MODEL_INSTANCE` is the deployed embedding model's name from  Azure OpenAI (e.g., `my_embedding-model`). To set this configuration globally, we use the `llama_index.core.Settings`.
+The code initializes the LLM and embedding models. `EMBEDDING_MODEL_INSTANCE` is the deployed embedding model’s name in Azure OpenAI (e.g., `my-embedding-model`). Use `llama_index.core.Settings` to set this configuration globally.

138-139: Use proper casing for SQLAlchemy.

[ suggest_nitpick_refactor ]

-We use sqlalchemy, a popular SQL database toolkit, to connect to CrateDB and SQLDatabase wrapper that allows CrateDB data to be used within LlamaIndex.
+Use SQLAlchemy to connect to CrateDB, and the `SQLDatabase` wrapper to use CrateDB data within LlamaIndex.

149-156: Minor: article and comma.

[ suggest_nitpick_refactor ]

-To query CrateDB using natural language we make an instance of `SQLDatabase` and provide a list of tables:
+To query CrateDB using natural language, create an instance of `SQLDatabase` and provide a list of tables:

167-169: Active voice.

[ suggest_nitpick_refactor ]

-At this point, we are ready to query CrateDB in plain English!
+You can now query CrateDB in plain English.

171-172: Active voice and specificity.

[ suggest_nitpick_refactor ]

-When dealing with time-series data we are usually interested in aggregate values. For instance, with our query, we are interested in the average value of sensor 1:
+Time‑series analysis often focuses on aggregates. For example, ask for the average value of sensor 1:

176-179: Good: corrected print syntax.

The stray closing parenthesis reported earlier is gone. Copy/paste will work.

184-189: Active voice and neutral phrasing.

[ suggest_nitpick_refactor ]

-Often, we are also interested in the query that produces the output. This is included in the answer's metadata:
+It is often useful to inspect the SQL that produced the result. LlamaIndex includes it in the answer metadata:

193-199: Active voice for the takeaway.

[ suggest_nitpick_refactor ]

-In this tutorial, we've embarked on the journey of using a natural language interface to query CrateDB data. We've explored how to seamlessly connect your data to the power of LLM using LlamaIndex and the capabilities of Azure OpenAI.
-
-This tutorial is a starting point. Explore additional resources on CrateDB and generative AI as they become available.
+This tutorial demonstrated how to query CrateDB data in natural language using LlamaIndex and Azure OpenAI.
+
+Use this as a starting point and explore additional CrateDB and generative AI resources as they become available.

35-37: Update model recommendations to Azure's current names (confirmed Sep 16, 2025).

Use a generic chat/completions model (e.g., gpt-4o-mini) and text-embedding-3-large for embeddings; let env vars control the exact deployment.

-  1. **GPT-35-turbo** for text generation tasks
-  2. **text-embedding-ada-002** for generating embeddings
+    1. A chat/completions model (e.g., **gpt-4o-mini**) for text generation
+    2. An embeddings model (e.g., **text-embedding-3-large**)

docs/start/query/ai-integration.md (3)

51-55: LGTM: new integrations list entries.

Adding LlamaIndex and PyCaret here improves discoverability and aligns with the new pages.

214-220: Concise, active phrasing nit.

[ suggest_nitpick_refactor ]
-Text‑to‑SQL lets you query data in natural language with contemporary
-large language models, optionally offline.
+Text‑to‑SQL lets you query data in natural language with contemporary large language models, including local/offline options.
59-61: Cross-links: anchors present; Sphinx build unavailable.

Found labels (llamaindex-synopsis, llamaindex-tutorial) at docs/integrate/llamaindex/synopsis.md:1 and docs/integrate/llamaindex/tutorial.md:1. sphinx-build is not installed in the verification environment (/bin/bash: line 6: sphinx-build: command not found). Run a local Sphinx build (e.g., sphinx-build -b html docs _build/html or sphinx-build -b linkcheck docs _build/linkcheck) to confirm the {ref} targets resolve with no warnings.

docs/integrate/llamaindex/index.md (3)

19-26: Style nits: tighten wording and consistency.

[ suggest_nitpick_refactor ]

-[LlamaIndex] is a data framework for Large Language Models (LLMs). It integrates
-with providers and models such as GPT‑4 or Llama 3/4, and provides interfaces to
-external data sources for natural‑language querying of your private data.
+[LlamaIndex] is a data framework for Large Language Models (LLMs). It integrates
+with providers and models such as GPT‑4 and Llama, and exposes interfaces to
+external data sources for natural‑language querying of private data.

54-64: Minor: card copy polish (active voice).

[ suggest_nitpick_refactor ]

-- Connect CrateDB to an LLM via OpenAI or Azure OpenAI.
-- Text‑to‑SQL: Query CrateDB in natural language.
+- Connect CrateDB to an LLM via OpenAI or Azure OpenAI.
+- Query CrateDB in natural language (Text‑to‑SQL).

77-82: LGTM: toctree structure.

Hidden toctree cleanly exposes synopsis/tutorial to ref targets.

docs/conf.py (1)

74-78: Tighten linkcheck ignore patterns (avoid masking real issues)

Don't ignore whole domains — narrow ignores to the exact pages that flake on HEAD.

-    # 400 Client Error: Bad Request for url
-    r"https://www.llama.com/.*",
-    # 403 Client Error: Forbidden for url
-    r"https://openai.com/index/gpt-4/.*",
+    # 400 Client Error: Bad Request for url (Meta Llama models page flakes on HEAD)
+    r"https://www\.llama\.com/models/llama-4/.*",
+    # 403 Client Error: Forbidden for url (OpenAI index article flakes on HEAD)
+    r"https://openai\.com/index/gpt-4/.*",

Could not run linkcheck here (sphinx-build not found). Re-run locally to confirm no other links need suppression:

#!/usr/bin/env bash
set -euo pipefail
sphinx-build -b linkcheck docs _build/linkcheck
grep -E "broken|warning" -n _build/linkcheck/output.txt || true

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

docs/integrate/llamaindex/tutorial.md (1)

17-23: Add missing Python packages in prerequisites (prevents import/runtime errors).

Your code uses OpenAI, LangChain embeddings, and LlamaIndex’s LangChain adapter. List those explicitly and drop the stale install link.

 * Python 3.10 or higher
-* Recent version of LlamaIndex, please follow the [installation instructions](https://gpt-index.readthedocs.io/en/latest/getting_started/installation.html)
+* Recent version of LlamaIndex
+* `openai` (Python SDK)
+* `langchain-openai` and `llama-index-embeddings-langchain` (for `LangchainEmbedding` with `OpenAIEmbeddings`/`AzureOpenAIEmbeddings`)
 * `sqlalchemy-cratedb`
 * `SQLAlchemy` (if not pulled transitively)
 * Running instance of [CrateDB](https://console.cratedb.cloud/)
 * [Azure subscription](https://azure.microsoft.com/en-gb/free/cognitive-services/) and [Azure OpenAI resource](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal)

And add a quick-start install block after the list:

+Recommended install:
+
+```bash
+pip install "llama-index" "openai" "langchain-openai" \
+  "llama-index-embeddings-langchain" "SQLAlchemy" "sqlalchemy-cratedb"
+```

🧹 Nitpick comments (10)

docs/integrate/llamaindex/tutorial.md (7)

34-37: Fix nested list indentation (markdownlint MD005).

Insert a blank line and use 4‑space indent for the nested list.

-2. This opens Azure AI Studio. Click *Create new deployment* and deploy:
-  1. A chat/completions model (e.g., **gpt-4o-mini**)
-  2. An embeddings model (e.g., **text-embedding-3-large**)
+2. This opens Azure AI Studio. Click *Create new deployment* and deploy:
+
+    1. A chat/completions model (e.g., **gpt-4o-mini**)
+    2. An embeddings model (e.g., **text-embedding-3-large**)

88-124: Avoid mixing SDK globals; use env var once and fix “Open AI” spelling.

Using openai.api_type requires importing and mutating SDK globals. Read the env var into a local variable instead, and standardize “OpenAI”. Also ensure the required imports (shown below) exist in the snippet or an earlier one.

 def configure_llm():
     """
-    Configure LLM. Use either vanilla Open AI, or Azure Open AI.
+    Configure LLM. Use either OpenAI or Azure OpenAI.
     """
 
-    openai.api_type = os.getenv("OPENAI_API_TYPE")
-    openai.azure_endpoint = os.getenv("OPENAI_AZURE_ENDPOINT")
-    openai.api_version = os.getenv("OPENAI_AZURE_API_VERSION")
-    openai.api_key = os.getenv("OPENAI_API_KEY")
+    api_type = os.getenv("OPENAI_API_TYPE")
+    azure_endpoint = os.getenv("OPENAI_AZURE_ENDPOINT")
+    api_version = os.getenv("OPENAI_AZURE_API_VERSION")
+    api_key = os.getenv("OPENAI_API_KEY")
 
-    if openai.api_type == "openai":
+    if api_type == "openai":
         llm = OpenAI(
-            api_key=os.getenv("OPENAI_API_KEY"),
+            api_key=api_key,
             temperature=0.0
         )
-    elif openai.api_type == "azure":
+    elif api_type == "azure":
         llm = AzureOpenAI(
             engine=os.getenv("LLM_INSTANCE"),
-            azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"),
-            api_key = os.getenv("OPENAI_API_KEY"),
-            api_version = os.getenv("OPENAI_AZURE_API_VERSION"),
+            azure_endpoint=azure_endpoint,
+            api_key=api_key,
+            api_version=api_version,
             temperature=0.0
         )
     else:
-        raise ValueError(f"Open AI API type not defined or invalid: {openai.api_type}")
+        raise ValueError(f"OpenAI API type not defined or invalid: {api_type}")
 
     Settings.llm = llm
-    if openai.api_type == "openai":
+    if api_type == "openai":
         Settings.embed_model = LangchainEmbedding(OpenAIEmbeddings())
-    elif openai.api_type == "azure":
+    elif api_type == "azure":
         Settings.embed_model = LangchainEmbedding(
             AzureOpenAIEmbeddings(
-                azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"),
+                azure_endpoint=azure_endpoint,
                 model=os.getenv("EMBEDDING_MODEL_INSTANCE")
             )
         )

Required imports (add once near the snippet if not already present):

from llama_index.core import Settings
from llama_index.embeddings.langchain import LangchainEmbedding
from langchain_openai import OpenAIEmbeddings, AzureOpenAIEmbeddings
from llama_index.llms.openai import OpenAI, AzureOpenAI
# import os  # if not already imported

138-138: Tighten language and capitalization; switch to active voice.

-We use sqlalchemy, a popular SQL database toolkit, to connect to CrateDB and SQLDatabase wrapper that allows CrateDB data to be used within LlamaIndex.
+Use SQLAlchemy to connect to CrateDB and the SQLDatabase wrapper to expose tables to LlamaIndex.

151-156: Document CRATEDB_TABLE_NAME or inline the table name.

Readers will hit None/KeyError if the env var isn’t set. Add an export example or inline the literal.

 sql_database = SQLDatabase(
     engine_crate, 
     include_tables=[os.getenv("CRATEDB_TABLE_NAME")]
 )
+
+# or set the variable once:
+# $ export CRATEDB_TABLE_NAME=time_series_data

Would you like me to submit a follow-up commit that inlines "time_series_data" in the snippet to minimize setup friction?

Also applies to: 160-165

171-182: Use second person and active voice in the question section.

-When dealing with time-series data we are usually interested in aggregate values. For instance, with our query, we are interested in the average value of sensor 1:
+With time‑series data you often care about aggregates. For example, compute the average value for sensor 1:

184-189: Active voice: describe metadata inspection directly.

-Often, we are also interested in the query that produces the output. This is included in the answer's metadata:
+You can also inspect the SQL that produced the answer; it’s included in `answer.metadata`:

193-199: Close in active voice; remove first‑person phrasing.

-In this tutorial, we've embarked on the journey of using a natural language interface to query CrateDB data. We've explored how to seamlessly connect your data to the power of LLM using LlamaIndex and the capabilities of Azure OpenAI.
-
-This tutorial is a starting point. Explore additional resources on CrateDB and generative AI as they become available.
-
-If you want to try this out yourself, you can find the full example code and supporting resources in the [`cratedb-examples` GitHub repository](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index).
+This tutorial shows how to query CrateDB data using natural language with LlamaIndex and Azure OpenAI.
+
+Explore more CrateDB and generative‑AI resources as they become available.
+
+Find the full example code and supporting resources in the [`cratedb-examples` GitHub repository](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index).

docs/integrate/llamaindex/index.md (3)

23-26: Tighten Azure OpenAI wording; use active voice.

-Azure OpenAI Service is a fully managed service that runs on the Azure global
-infrastructure and allows developers to integrate OpenAI models into their
-applications. Through the Azure OpenAI API, you can access a wide range of
-AI models in a scalable and reliable way.
+Azure OpenAI Service runs on the Azure global infrastructure and lets you integrate OpenAI models into your applications. The Azure OpenAI API provides scalable access to a wide range of models.

54-59: Clarify provider naming in the demo tile.

-:::{grid-item-card} Demo: Using LlamaIndex with OpenAI and CrateDB
+:::{grid-item-card} Demo: Using LlamaIndex with OpenAI/Azure OpenAI and CrateDB

66-71: Silence markdownlint MD034 (bare URL) for 🔗 directive argument.

The directive requires a raw URL; add a one-line suppression to keep CI green.

-:::{grid-item-card} LlamaIndex and CrateDB: Code Examples
+<!-- markdownlint-disable-next-line MD034 -->
+:::{grid-item-card} LlamaIndex and CrateDB: Code Examples
 :link: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index
 :link-type: url
 NL2SQL with LlamaIndex: Querying CrateDB using natural language.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f4a3b8 and cbe650c.

📒 Files selected for processing (4)

docs/integrate/llamaindex/index.md (2 hunks)
docs/integrate/llamaindex/synopsis.md (1 hunks)
docs/integrate/llamaindex/tutorial.md (1 hunks)
docs/start/query/ai-integration.md (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

docs/integrate/llamaindex/synopsis.md
docs/start/query/ai-integration.md

🧰 Additional context used

🪛 markdownlint-cli2 (0.17.2)

docs/integrate/llamaindex/index.md

67-67: Bare URL used

(MD034, no-bare-urls)

docs/integrate/llamaindex/tutorial.md

35-35: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2

(MD005, list-indent)

36-36: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2

(MD005, list-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build docs

docs/integrate/llamaindex/tutorial.md

amotl added refactoring Changing shape or layout, or moving content around. cross linking Linking to different locations of the documentation. labels Sep 16, 2025

amotl force-pushed the llamaindex branch from bd0ee1b to 29ee433 Compare September 16, 2025 07:52

LlamaIndex: Refactor Text-to-SQL synopsis. Add full tutorial.

d635928

amotl changed the base branch from ml-ng to main September 16, 2025 09:35

amotl force-pushed the llamaindex branch from 29ee433 to d635928 Compare September 16, 2025 09:35

amotl marked this pull request as ready for review September 16, 2025 10:05

coderabbitai bot reviewed Sep 16, 2025

View reviewed changes

docs/integrate/llamaindex/synopsis.md Show resolved Hide resolved

docs/integrate/llamaindex/tutorial.md Show resolved Hide resolved

docs/integrate/llamaindex/tutorial.md Outdated Show resolved Hide resolved

LlamaIndex: Fix links and satisfy link checker

8200337

amotl force-pushed the llamaindex branch from eb124b6 to 0f4a3b8 Compare September 16, 2025 10:34

coderabbitai bot reviewed Sep 16, 2025

View reviewed changes

amotl force-pushed the llamaindex branch from 0f4a3b8 to cbe650c Compare September 16, 2025 10:52

coderabbitai bot reviewed Sep 16, 2025

View reviewed changes

docs/integrate/llamaindex/tutorial.md Outdated Show resolved Hide resolved

LlamaIndex: Implement suggestions by CodeRabbit

2eb9674

amotl force-pushed the llamaindex branch from cbe650c to 2eb9674 Compare September 16, 2025 11:15

amotl merged commit ed8b143 into main Sep 16, 2025
3 checks passed

amotl deleted the llamaindex branch September 16, 2025 11:30

coderabbitai bot mentioned this pull request Sep 16, 2025

Machine learning: Rework and relocate section #314

Merged

amotl mentioned this pull request Sep 16, 2025

Improve pages about machine learning topics #313

Open

coderabbitai bot mentioned this pull request Sep 16, 2025

NiFi: Usage guide #280

Merged

amotl mentioned this pull request Sep 17, 2025

Consolidate Integration Guides I vs. II #102

Open

This was referenced Sep 30, 2025

Chore: Fix broken references #349

Merged

CrateDB 3.x: Documentation for 3.3 turned off, so refer to sources instead. Also tune down other content. #359

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LlamaIndex: Refactor Text-to-SQL synopsis. Add full tutorial. #316

LlamaIndex: Refactor Text-to-SQL synopsis. Add full tutorial. #316

Uh oh!

amotl commented Sep 16, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 16, 2025 •

edited

Loading

Rate limit exceeded

Uh oh!

amotl commented Sep 16, 2025

Uh oh!

coderabbitai bot commented Sep 16, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LlamaIndex: Refactor Text-to-SQL synopsis. Add full tutorial. #316

LlamaIndex: Refactor Text-to-SQL synopsis. Add full tutorial. #316

Uh oh!

Conversation

amotl commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

About

Preview

References

Uh oh!

coderabbitai bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

amotl commented Sep 16, 2025

Uh oh!

coderabbitai bot commented Sep 16, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

amotl commented Sep 16, 2025 •

edited

Loading

coderabbitai bot commented Sep 16, 2025 •

edited

Loading