-
Notifications
You must be signed in to change notification settings - Fork 1
LlamaIndex: Refactor Text-to-SQL synopsis. Add full tutorial. #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning Rate limit exceeded@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 46 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (4)
WalkthroughAdds two new LlamaIndex documentation pages (synopsis and tutorial), refactors the LlamaIndex integration landing page into a tile/grid layout removing heavy inline examples, updates AI integration references to point to the new pages, and appends two external URL patterns to the docs linkcheck ignore list. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant App as App/Notebook
participant LI as LlamaIndex
participant LLM as LLM (OpenAI/Azure/Ollama)
participant DB as CrateDB (SQLAlchemy)
User->>App: Ask natural-language question
App->>LI: Build NLSQLTableQueryEngine (Settings.llm, include_tables)
App->>LLM: Configure/select provider (API key, endpoint, model)
LI->>LLM: Generate SQL from NL query
LI->>DB: Execute generated SQL
DB-->>LI: Return rows/results
LI-->>App: Return answer, sources, metadata (SQL)
App-->>User: Display answer and provenance
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
@coderabbitai review and please converge all spots to active voice. |
@amotl I'll review the PR and focus on converting all instances to active voice as you requested. Let me analyze the changes and provide suggestions for improving the voice and tone throughout the documentation. 🧠 Learnings used
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (13)
docs/conf.py (1)
74-76
: Make the ignore pattern resilient to anchors/query strings.OpenAI pages often append anchors (e.g., #section) or query params. Widen the regex so linkcheck ignores those variants too.
- r"https://openai.com/index/gpt-4/", + r"https://openai.com/index/gpt-4/.*",docs/integrate/llamaindex/synopsis.md (5)
63-70
: Parameterize the Ollama model; default to a widely available one.Hard‑coding “gemma3:1b” risks 404s on common installations. Prefer an env var with a sensible default.
-llm = Ollama( - base_url=os.getenv("OLLAMA_BASE_URL", "http://localhost:11434"), - model="gemma3:1b", +llm = Ollama( + base_url=os.getenv("OLLAMA_BASE_URL", "http://localhost:11434"), + model=os.getenv("OLLAMA_MODEL", "llama3:8b-instruct"), temperature=0.0, request_timeout=120.0, keep_alive=-1, )
80-82
: Avoid opening an unused connection.
create_engine
is enough; the strayconnect()
leaks a connection in examples readers may copy/paste.-database = sa.create_engine(os.getenv("CRATEDB_SQLALCHEMY_URL", "crate://crate@localhost:4200")) -database.connect() +database = sa.create_engine(os.getenv("CRATEDB_SQLALCHEMY_URL", "crate://crate@localhost:4200"))
12-18
: Pin compatible minimum versions (prevent accidental downgrades).Upper bounds alone allow too‑old versions. Add minimal versions matching current APIs used here.
Proposed (verify exact mins in your env):
-langchain-openai<0.3 -llama-index-embeddings-langchain<0.4 -llama-index-embeddings-openai<0.4 -llama-index-llms-azure-openai<0.4 -llama-index-llms-openai<0.4 +langchain-openai>=0.1.6,<0.3 +llama-index-embeddings-langchain>=0.3.0,<0.4 +llama-index-embeddings-openai>=0.2.0,<0.4 +llama-index-llms-azure-openai>=0.2.0,<0.4 +llama-index-llms-openai>=0.2.0,<0.4 sqlalchemy-cratedb
41-47
: Tighten phrasing to active voice.Small copy tweak.
-Provision LLM using OpenAI model. +Provision an LLM using an OpenAI model.
56-62
: Use active voice here, too.-Alternatively, provision LLM using self-hosted model. +Provision an LLM using a self‑hosted model.docs/start/query/ai-integration.md (1)
214-220
: Active-voice polish.Streamline the intro sentence.
-Text-to-SQL: Talk to your data using human language and contemporary large -language models, optionally offline. +Text‑to‑SQL lets you query data in natural language with contemporary large language models, optionally offline.docs/integrate/llamaindex/index.md (2)
16-28
: Clarify “About” and fix branding (“OpenAI”).LlamaIndex doesn’t ship models; it integrates with them. Also prefer “Azure OpenAI”.
-[LlamaIndex] is a data framework for Large Language Models (LLMs). It comes with -pre-trained models on massive public datasets such as GPT-4 or Llama 2, and -provides an interface to external data sources allowing for natural language -querying on your private data. +[LlamaIndex] is a data framework for Large Language Models (LLMs). It integrates with +providers and models such as GPT‑4 or Llama 3, and +provides interfaces to external data sources for natural‑language querying of your private data. -Azure Open AI Service is a fully managed service that runs on the Azure global +Azure OpenAI Service is a fully managed service that runs on the Azure global infrastructure and allows developers to integrate OpenAI models into their -applications. Through Azure Open AI API one can easily access a wide range of +applications. Through the Azure OpenAI API, you can access a wide range of AI models in a scalable and reliable way.
55-66
: Tighten duplicate phrasing in demo tile.Combine overlapping sentences.
-- Connect your CrateDB data to an LLM using OpenAI or Azure OpenAI. -- Text-to-SQL / Talk to your data: Query the database in human language; query CrateDB in plain English. +- Connect CrateDB to an LLM via OpenAI or Azure OpenAI. +- Text‑to‑SQL: Query CrateDB in natural language.docs/integrate/llamaindex/tutorial.md (4)
6-11
: Fix branding and shift to active voice.-[LlamaIndex](https://www.llamaindex.ai/) is a data framework for Large Language Models (LLMs). It comes with pre-trained models on massive public datasets such as [GPT-4](https://openai.com/index/gpt-4/) or [Llama 4](https://www.llama.com/models/llama-4/) and provides an interface to external data sources allowing for natural language querying on your private data. +[LlamaIndex](https://www.llamaindex.ai/) is a data framework for Large Language Models (LLMs). It integrates with models such as [GPT‑4](https://openai.com/index/gpt-4/) or [Llama 4](https://www.llama.com/models/llama-4/) and provides interfaces to external data sources for natural‑language querying of your private data. -[Azure Open AI Service](https://azure.microsoft.com/en-us/products/ai-services/openai-service) is a fully managed service that runs on the Azure global infrastructure and allows developers to integrate OpenAI models into their applications. Through Azure Open AI API one can easily access a wide range of AI models in a scalable and reliable way. +[Azure OpenAI Service](https://azure.microsoft.com/en-us/products/ai-services/openai-service) is a fully managed service on the Azure global infrastructure that lets developers integrate OpenAI models into applications. Through the Azure OpenAI API, you can access a wide range of AI models in a scalable and reliable way. -In this tutorial, we will illustrate how to augment existing LLMs with data stored in CrateDB through the LlamaIndex framework and Azure Open AI Service. By doing this, you will be able to use the power of generative AI models with your own data in just a few lines of code. +This tutorial shows how to augment LLMs with data stored in CrateDB using LlamaIndex and Azure OpenAI, enabling natural‑language queries over your data.
137-141
: Avoid unusedconnect()
; use the engine directly.-engine_crate = sa.create_engine(os.getenv("CRATEDB_SQLALCHEMY_URL")) -engine_crate.connect() +engine_crate = sa.create_engine(os.getenv("CRATEDB_SQLALCHEMY_URL"))
156-164
: Fix typo and tighten phrasing.-Then use that to create am instance of `NLSQLTableQueryEngine`: +Then create an instance of `NLSQLTableQueryEngine`:
190-199
: Avoid time‑sensitive promises; keep it evergreen.[ sugerst_nitpick_refactor -> note: correct tag is suggest_nitpick_refactor ]
-This tutorial is just the beginning. You can expect further resources, documentation, and tutorials related to CrateDB and generative AI from us. Also, stay tuned for the CrateDB 5.5 release: we will soon announce the support for the vector store and search, allowing you to implement similarity-based data retrieval efficiently. +This tutorial is a starting point. Explore additional resources on CrateDB and generative AI as they become available.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/conf.py
(1 hunks)docs/integrate/llamaindex/index.md
(2 hunks)docs/integrate/llamaindex/synopsis.md
(1 hunks)docs/integrate/llamaindex/tutorial.md
(1 hunks)docs/start/query/ai-integration.md
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (5)
docs/start/query/ai-integration.md (2)
51-55
: Good additions to Integrations.New anchors for LlamaIndex and PyCaret are clear and consistent.
59-60
: Cross‑links look correct.References resolve to the new synopsis and tutorial pages.
Please build the docs locally to confirm sphinx refs resolve:
make html && rg -n "undefined label" build/
(should output nothing).docs/integrate/llamaindex/index.md (2)
43-53
: LGTM — concise, active copy.Synopsis tile reads well.
78-83
: ToC looks right.Hidden ToC with synopsis and tutorial is appropriate.
docs/integrate/llamaindex/tutorial.md (1)
85-121
: Simplify configure_llm — don’t mutate openai globals; keep LlamaIndex AzureOpenAIengine
Read OPENAI_API_TYPE into a local variable, pick the wrapper, and pass credentials into the OpenAI/AzureOpenAI constructors instead of assigning to openai.api_type / openai.api_key / openai.azure_endpoint. For llama-index-llms-azure-openai, AzureOpenAI expects an
engine
(your Azure deployment name) — keep usingengine
, notmodel
/deployment_name
. (docs.llamaindex.ai)File: docs/integrate/llamaindex/tutorial.md (lines 85–121)
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/integrate/llamaindex/tutorial.md (1)
17-23
: Prerequisites: list required Python packages explicitly and update LlamaIndex docs URL.This prevents import errors (LangChain embeddings, OpenAI/Azure clients, CrateDB dialect).
-* Recent version of LlamaIndex, please follow the [installation instructions](https://gpt-index.readthedocs.io/en/latest/getting_started/installation.html) +* Recent version of LlamaIndex — see [installation instructions](https://docs.llamaindex.ai/en/stable/getting_started/installation/) +* `openai` +* `llama-index` +* `langchain-openai` and `llama-index-embeddings-langchain` * `sqlalchemy-cratedb` * `SQLAlchemy` (if not pulled transitively) * Running instance of [CrateDB](https://console.cratedb.cloud/) * [Azure subscription](https://azure.microsoft.com/en-gb/free/cognitive-services/) and [Azure OpenAI resource](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal)
🧹 Nitpick comments (2)
docs/integrate/llamaindex/tutorial.md (1)
34-37
: Fix markdown list indentation (MD005).Align nested list items under step 2.
-2. This will open Azure AI Studio. Azure AI Studio enables developers to build, run, and deploy AI applications. Click on the *Create new deployment* button to deploy the following models: - 1. **GPT-35-turbo** for text generation tasks - 2. **text-embedding-ada-002** for generating embeddings +2. This opens Azure AI Studio. Click *Create new deployment* and deploy: + 1. A chat/completions model (e.g., **gpt-4o-mini**) + 2. An embeddings model (e.g., **text-embedding-3-large**)docs/integrate/llamaindex/index.md (1)
66-72
: markdownlint MD034 (bare URL) in card link.Either suppress MD034 for this line or move the URL into the card body as a named link.
Option A (suppress):
+% markdownlint-disable-next-line MD034 :link: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index
Option B (move link into body, drop 🔗):
-:::{grid-item-card} LlamaIndex and CrateDB: Code Examples -:link: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index -:link-type: url -NL2SQL with LlamaIndex: Querying CrateDB using natural language. +:::{grid-item-card} LlamaIndex and CrateDB: Code Examples +NL2SQL with LlamaIndex: Querying CrateDB using natural language. See +[LlamaIndex and CrateDB: Code Examples]. {tags-primary}`Runnable example` :::
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/conf.py
(1 hunks)docs/integrate/llamaindex/index.md
(2 hunks)docs/integrate/llamaindex/synopsis.md
(1 hunks)docs/integrate/llamaindex/tutorial.md
(1 hunks)docs/start/query/ai-integration.md
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/integrate/llamaindex/synopsis.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/integrate/llamaindex/index.md
67-67: Bare URL used
(MD034, no-bare-urls)
docs/integrate/llamaindex/tutorial.md
35-35: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
36-36: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (21)
docs/integrate/llamaindex/tutorial.md (14)
13-13
: Active voice: remove first-person phrasing.[ suggest_nitpick_refactor ]
-If you want to run this in your own environment, we've provided all of the code and supporting resources that you'll need in the [`cratedb-examples`](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index) GitHub repository. +The [`cratedb-examples`](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index) repository provides all required code and supporting resources to run this tutorial.
46-54
: Active voice: remove “Let’s …”.[ suggest_nitpick_refactor ]
-## Load time-series data to CrateDB - -Let’s now create the `time_series_data` table in CrateDB that contains time series data, where each row represents a data point with the following information: +## Load time-series data into CrateDB + +Create the `time_series_data` table. Each row represents a data point with:
64-66
: Active voice and clarity.[ suggest_nitpick_refactor ]
-Import a portion of the data we will use for learning and querying: +Import sample data for learning and querying:
85-96
: Tighten wording and fix product names.[ suggest_nitpick_refactor ]
-Azure OpenAI resource differs slightly from the standard OpenAI resource as it requires the use of the embedding model, which we deployed in the previous step. The following code illustrates the setup of OpenAI API: +Azure OpenAI differs slightly from the standard OpenAI setup because you must specify the embedding model deployed earlier. The following code configures the OpenAI API:
88-124
: Avoid relying onopenai.api_type
; drive off an env var and set embedding model consistently.Using a local
api_type
var avoids coupling to the OpenAI SDK’s module globals and keeps LlamaIndex wrappers isolated. Also set the embedding model for both branches.[ suggest_recommended_refactor ]
def configure_llm(): """ - Configure LLM. Use either vanilla Open AI, or Azure Open AI. + Configure the LLM. Use either OpenAI or Azure OpenAI. """ - - openai.api_type = os.getenv("OPENAI_API_TYPE") - openai.azure_endpoint = os.getenv("OPENAI_AZURE_ENDPOINT") - openai.api_version = os.getenv("OPENAI_AZURE_API_VERSION") - openai.api_key = os.getenv("OPENAI_API_KEY") + api_type = os.getenv("OPENAI_API_TYPE", "openai").lower() - if openai.api_type == "openai": + if api_type == "openai": llm = OpenAI( api_key=os.getenv("OPENAI_API_KEY"), temperature=0.0 ) - elif openai.api_type == "azure": + elif api_type == "azure": llm = AzureOpenAI( engine=os.getenv("LLM_INSTANCE"), azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"), - api_key = os.getenv("OPENAI_API_KEY"), - api_version = os.getenv("OPENAI_AZURE_API_VERSION"), + api_key=os.getenv("OPENAI_API_KEY"), + api_version=os.getenv("OPENAI_AZURE_API_VERSION"), temperature=0.0 ) else: - raise ValueError(f"Open AI API type not defined or invalid: {openai.api_type}") + raise ValueError(f"OpenAI API type not defined or invalid: {api_type}") Settings.llm = llm - if openai.api_type == "openai": - Settings.embed_model = LangchainEmbedding(OpenAIEmbeddings()) - elif openai.api_type == "azure": + if api_type == "openai": + Settings.embed_model = LangchainEmbedding( + OpenAIEmbeddings(model=os.getenv("EMBEDDING_MODEL_INSTANCE", "text-embedding-3-small")) + ) + elif api_type == "azure": Settings.embed_model = LangchainEmbedding( AzureOpenAIEmbeddings( azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"), model=os.getenv("EMBEDDING_MODEL_INSTANCE") ) )Please verify imports in the snippet (LlamaIndex wrappers and LangChain embeddings) align with the packages listed in Prerequisites.
130-133
: Editorial: clarify Settings usage and fix minor spacing.[ suggest_nitpick_refactor ]
-The code also initializes the LLM and embedding models. The value for `EMBEDDING_MODEL_INSTANCE` is the deployed embedding model's name from Azure OpenAI (e.g., `my_embedding-model`). To set this configuration globally, we use the `llama_index.core.Settings`. +The code initializes the LLM and embedding models. `EMBEDDING_MODEL_INSTANCE` is the deployed embedding model’s name in Azure OpenAI (e.g., `my-embedding-model`). Use `llama_index.core.Settings` to set this configuration globally.
138-139
: Use proper casing for SQLAlchemy.[ suggest_nitpick_refactor ]
-We use sqlalchemy, a popular SQL database toolkit, to connect to CrateDB and SQLDatabase wrapper that allows CrateDB data to be used within LlamaIndex. +Use SQLAlchemy to connect to CrateDB, and the `SQLDatabase` wrapper to use CrateDB data within LlamaIndex.
149-156
: Minor: article and comma.[ suggest_nitpick_refactor ]
-To query CrateDB using natural language we make an instance of `SQLDatabase` and provide a list of tables: +To query CrateDB using natural language, create an instance of `SQLDatabase` and provide a list of tables:
167-169
: Active voice.[ suggest_nitpick_refactor ]
-At this point, we are ready to query CrateDB in plain English! +You can now query CrateDB in plain English.
171-172
: Active voice and specificity.[ suggest_nitpick_refactor ]
-When dealing with time-series data we are usually interested in aggregate values. For instance, with our query, we are interested in the average value of sensor 1: +Time‑series analysis often focuses on aggregates. For example, ask for the average value of sensor 1:
176-179
: Good: corrected print syntax.The stray closing parenthesis reported earlier is gone. Copy/paste will work.
184-189
: Active voice and neutral phrasing.[ suggest_nitpick_refactor ]
-Often, we are also interested in the query that produces the output. This is included in the answer's metadata: +It is often useful to inspect the SQL that produced the result. LlamaIndex includes it in the answer metadata:
193-199
: Active voice for the takeaway.[ suggest_nitpick_refactor ]
-In this tutorial, we've embarked on the journey of using a natural language interface to query CrateDB data. We've explored how to seamlessly connect your data to the power of LLM using LlamaIndex and the capabilities of Azure OpenAI. - -This tutorial is a starting point. Explore additional resources on CrateDB and generative AI as they become available. +This tutorial demonstrated how to query CrateDB data in natural language using LlamaIndex and Azure OpenAI. + +Use this as a starting point and explore additional CrateDB and generative AI resources as they become available.
35-37
: Update model recommendations to Azure's current names (confirmed Sep 16, 2025).Use a generic chat/completions model (e.g., gpt-4o-mini) and text-embedding-3-large for embeddings; let env vars control the exact deployment.
- 1. **GPT-35-turbo** for text generation tasks - 2. **text-embedding-ada-002** for generating embeddings + 1. A chat/completions model (e.g., **gpt-4o-mini**) for text generation + 2. An embeddings model (e.g., **text-embedding-3-large**)docs/start/query/ai-integration.md (3)
51-55
: LGTM: new integrations list entries.Adding LlamaIndex and PyCaret here improves discoverability and aligns with the new pages.
214-220
: Concise, active phrasing nit.[ suggest_nitpick_refactor ]
-Text‑to‑SQL lets you query data in natural language with contemporary -large language models, optionally offline. +Text‑to‑SQL lets you query data in natural language with contemporary large language models, including local/offline options.
59-61
: Cross-links: anchors present; Sphinx build unavailable.Found labels (llamaindex-synopsis, llamaindex-tutorial) at docs/integrate/llamaindex/synopsis.md:1 and docs/integrate/llamaindex/tutorial.md:1. sphinx-build is not installed in the verification environment (/bin/bash: line 6: sphinx-build: command not found). Run a local Sphinx build (e.g.,
sphinx-build -b html docs _build/html
orsphinx-build -b linkcheck docs _build/linkcheck
) to confirm the {ref} targets resolve with no warnings.docs/integrate/llamaindex/index.md (3)
19-26
: Style nits: tighten wording and consistency.[ suggest_nitpick_refactor ]
-[LlamaIndex] is a data framework for Large Language Models (LLMs). It integrates -with providers and models such as GPT‑4 or Llama 3/4, and provides interfaces to -external data sources for natural‑language querying of your private data. +[LlamaIndex] is a data framework for Large Language Models (LLMs). It integrates +with providers and models such as GPT‑4 and Llama, and exposes interfaces to +external data sources for natural‑language querying of private data.
54-64
: Minor: card copy polish (active voice).[ suggest_nitpick_refactor ]
-- Connect CrateDB to an LLM via OpenAI or Azure OpenAI. -- Text‑to‑SQL: Query CrateDB in natural language. +- Connect CrateDB to an LLM via OpenAI or Azure OpenAI. +- Query CrateDB in natural language (Text‑to‑SQL).
77-82
: LGTM: toctree structure.Hidden toctree cleanly exposes synopsis/tutorial to ref targets.
docs/conf.py (1)
74-78
: Tighten linkcheck ignore patterns (avoid masking real issues)Don't ignore whole domains — narrow ignores to the exact pages that flake on HEAD.
- # 400 Client Error: Bad Request for url - r"https://www.llama.com/.*", - # 403 Client Error: Forbidden for url - r"https://openai.com/index/gpt-4/.*", + # 400 Client Error: Bad Request for url (Meta Llama models page flakes on HEAD) + r"https://www\.llama\.com/models/llama-4/.*", + # 403 Client Error: Forbidden for url (OpenAI index article flakes on HEAD) + r"https://openai\.com/index/gpt-4/.*",Could not run linkcheck here (sphinx-build not found). Re-run locally to confirm no other links need suppression:
#!/usr/bin/env bash set -euo pipefail sphinx-build -b linkcheck docs _build/linkcheck grep -E "broken|warning" -n _build/linkcheck/output.txt || true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/integrate/llamaindex/tutorial.md (1)
17-23
: Add missing Python packages in prerequisites (prevents import/runtime errors).Your code uses OpenAI, LangChain embeddings, and LlamaIndex’s LangChain adapter. List those explicitly and drop the stale install link.
* Python 3.10 or higher -* Recent version of LlamaIndex, please follow the [installation instructions](https://gpt-index.readthedocs.io/en/latest/getting_started/installation.html) +* Recent version of LlamaIndex +* `openai` (Python SDK) +* `langchain-openai` and `llama-index-embeddings-langchain` (for `LangchainEmbedding` with `OpenAIEmbeddings`/`AzureOpenAIEmbeddings`) * `sqlalchemy-cratedb` * `SQLAlchemy` (if not pulled transitively) * Running instance of [CrateDB](https://console.cratedb.cloud/) * [Azure subscription](https://azure.microsoft.com/en-gb/free/cognitive-services/) and [Azure OpenAI resource](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal)And add a quick-start install block after the list:
+Recommended install: + +```bash +pip install "llama-index" "openai" "langchain-openai" \ + "llama-index-embeddings-langchain" "SQLAlchemy" "sqlalchemy-cratedb" +```
🧹 Nitpick comments (10)
docs/integrate/llamaindex/tutorial.md (7)
34-37
: Fix nested list indentation (markdownlint MD005).Insert a blank line and use 4‑space indent for the nested list.
-2. This opens Azure AI Studio. Click *Create new deployment* and deploy: - 1. A chat/completions model (e.g., **gpt-4o-mini**) - 2. An embeddings model (e.g., **text-embedding-3-large**) +2. This opens Azure AI Studio. Click *Create new deployment* and deploy: + + 1. A chat/completions model (e.g., **gpt-4o-mini**) + 2. An embeddings model (e.g., **text-embedding-3-large**)
88-124
: Avoid mixing SDK globals; use env var once and fix “Open AI” spelling.Using
openai.api_type
requires importing and mutating SDK globals. Read the env var into a local variable instead, and standardize “OpenAI”. Also ensure the required imports (shown below) exist in the snippet or an earlier one.def configure_llm(): """ - Configure LLM. Use either vanilla Open AI, or Azure Open AI. + Configure LLM. Use either OpenAI or Azure OpenAI. """ - openai.api_type = os.getenv("OPENAI_API_TYPE") - openai.azure_endpoint = os.getenv("OPENAI_AZURE_ENDPOINT") - openai.api_version = os.getenv("OPENAI_AZURE_API_VERSION") - openai.api_key = os.getenv("OPENAI_API_KEY") + api_type = os.getenv("OPENAI_API_TYPE") + azure_endpoint = os.getenv("OPENAI_AZURE_ENDPOINT") + api_version = os.getenv("OPENAI_AZURE_API_VERSION") + api_key = os.getenv("OPENAI_API_KEY") - if openai.api_type == "openai": + if api_type == "openai": llm = OpenAI( - api_key=os.getenv("OPENAI_API_KEY"), + api_key=api_key, temperature=0.0 ) - elif openai.api_type == "azure": + elif api_type == "azure": llm = AzureOpenAI( engine=os.getenv("LLM_INSTANCE"), - azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"), - api_key = os.getenv("OPENAI_API_KEY"), - api_version = os.getenv("OPENAI_AZURE_API_VERSION"), + azure_endpoint=azure_endpoint, + api_key=api_key, + api_version=api_version, temperature=0.0 ) else: - raise ValueError(f"Open AI API type not defined or invalid: {openai.api_type}") + raise ValueError(f"OpenAI API type not defined or invalid: {api_type}") Settings.llm = llm - if openai.api_type == "openai": + if api_type == "openai": Settings.embed_model = LangchainEmbedding(OpenAIEmbeddings()) - elif openai.api_type == "azure": + elif api_type == "azure": Settings.embed_model = LangchainEmbedding( AzureOpenAIEmbeddings( - azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"), + azure_endpoint=azure_endpoint, model=os.getenv("EMBEDDING_MODEL_INSTANCE") ) )Required imports (add once near the snippet if not already present):
from llama_index.core import Settings from llama_index.embeddings.langchain import LangchainEmbedding from langchain_openai import OpenAIEmbeddings, AzureOpenAIEmbeddings from llama_index.llms.openai import OpenAI, AzureOpenAI # import os # if not already imported
138-138
: Tighten language and capitalization; switch to active voice.-We use sqlalchemy, a popular SQL database toolkit, to connect to CrateDB and SQLDatabase wrapper that allows CrateDB data to be used within LlamaIndex. +Use SQLAlchemy to connect to CrateDB and the SQLDatabase wrapper to expose tables to LlamaIndex.
151-156
: Document CRATEDB_TABLE_NAME or inline the table name.Readers will hit None/KeyError if the env var isn’t set. Add an export example or inline the literal.
sql_database = SQLDatabase( engine_crate, include_tables=[os.getenv("CRATEDB_TABLE_NAME")] ) + +# or set the variable once: +# $ export CRATEDB_TABLE_NAME=time_series_dataWould you like me to submit a follow-up commit that inlines "time_series_data" in the snippet to minimize setup friction?
Also applies to: 160-165
171-182
: Use second person and active voice in the question section.-When dealing with time-series data we are usually interested in aggregate values. For instance, with our query, we are interested in the average value of sensor 1: +With time‑series data you often care about aggregates. For example, compute the average value for sensor 1:
184-189
: Active voice: describe metadata inspection directly.-Often, we are also interested in the query that produces the output. This is included in the answer's metadata: +You can also inspect the SQL that produced the answer; it’s included in `answer.metadata`:
193-199
: Close in active voice; remove first‑person phrasing.-In this tutorial, we've embarked on the journey of using a natural language interface to query CrateDB data. We've explored how to seamlessly connect your data to the power of LLM using LlamaIndex and the capabilities of Azure OpenAI. - -This tutorial is a starting point. Explore additional resources on CrateDB and generative AI as they become available. - -If you want to try this out yourself, you can find the full example code and supporting resources in the [`cratedb-examples` GitHub repository](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index). +This tutorial shows how to query CrateDB data using natural language with LlamaIndex and Azure OpenAI. + +Explore more CrateDB and generative‑AI resources as they become available. + +Find the full example code and supporting resources in the [`cratedb-examples` GitHub repository](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index).docs/integrate/llamaindex/index.md (3)
23-26
: Tighten Azure OpenAI wording; use active voice.-Azure OpenAI Service is a fully managed service that runs on the Azure global -infrastructure and allows developers to integrate OpenAI models into their -applications. Through the Azure OpenAI API, you can access a wide range of -AI models in a scalable and reliable way. +Azure OpenAI Service runs on the Azure global infrastructure and lets you integrate OpenAI models into your applications. The Azure OpenAI API provides scalable access to a wide range of models.
54-59
: Clarify provider naming in the demo tile.-:::{grid-item-card} Demo: Using LlamaIndex with OpenAI and CrateDB +:::{grid-item-card} Demo: Using LlamaIndex with OpenAI/Azure OpenAI and CrateDB
66-71
: Silence markdownlint MD034 (bare URL) for 🔗 directive argument.The directive requires a raw URL; add a one-line suppression to keep CI green.
-:::{grid-item-card} LlamaIndex and CrateDB: Code Examples +<!-- markdownlint-disable-next-line MD034 --> +:::{grid-item-card} LlamaIndex and CrateDB: Code Examples :link: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llama-index :link-type: url NL2SQL with LlamaIndex: Querying CrateDB using natural language.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/integrate/llamaindex/index.md
(2 hunks)docs/integrate/llamaindex/synopsis.md
(1 hunks)docs/integrate/llamaindex/tutorial.md
(1 hunks)docs/start/query/ai-integration.md
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/integrate/llamaindex/synopsis.md
- docs/start/query/ai-integration.md
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/integrate/llamaindex/index.md
67-67: Bare URL used
(MD034, no-bare-urls)
docs/integrate/llamaindex/tutorial.md
35-35: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
36-36: Inconsistent indentation for list items at the same level
Expected: 0; Actual: 2
(MD005, list-indent)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
About
Refactor Text-to-SQL walkthrough into "LlamaIndex" section. Also, add the full tutorial from the community forum.
Preview
References