Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions autovec_unstructured/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Couchbase Capella AI Services Auto-Vectorization with LangChain

This guide is a comprehensive tutorial demonstrating how to use Couchbase Capella's AI Services auto-vectorization feature for unstructured data to automatically convert your data into vector embeddings and perform semantic search using LangChain.

## 🚀 Quick Start

### Prerequisites

- Python 3.8 or higher
- A Couchbase Capella account
- Basic understanding of vector databases and embeddings

### Installation Steps

1. **Clone or download this repository**
```bash
git clone https://github.com/couchbase-examples/vector-search-cookbook.git
cd vector-search-cookbook/autovec-unstructured
```

2. **Install Python dependencies**
```bash
pip install jupyter
pip install couchbase
pip install langchain-couchbase
pip install langchain-nvidia-ai-endpoints
```

3. **Start Jupyter Notebook**
```bash
jupyter notebook
```
or
```bash
jupyter lab
```

4. **Open the tutorial notebook**
- Navigate to `autovec_unstructured.ipynb` in the Jupyter interface
- Follow the step-by-step instructions in the notebook
```

**Note**: This tutorial is designed for educational purposes. For production deployments, ensure proper security configurations and SSL/TLS verification.
341 changes: 341 additions & 0 deletions autovec_unstructured/autovec_unstructured.ipynb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part.
Couple of questions/suggestions:

  • We should not show the example with TLS disabled. That is insecure & something most end users will not see as the Production clusters will not require this (SDK bundles the certs for Prod clusters)
  • Can you try using the OpenAI LangChain package instead of NVidia as that is what we recommend end users to use? You would need to set a few parameters to make it work but it should work. Unless there is some documentation around using Nvidia over OpenAI that I have missed. You can find examples on using OpenAI package in the Capella AI notebooks.
  • Can you also use a better search term? The current example looks a lot like FTS instead of semantic search. We want to show the power of Semantic Search.

Large diffs are not rendered by default.

18 changes: 18 additions & 0 deletions autovec_unstructured/frontmatter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
# frontmatter
path: "/tutorial-couchbase-autovectorization-langchain"
title: Auto-Vectorization with Couchbase Capella AI Services and LangChain
short_title: Auto-Vectorization with Couchbase and LangChain
description:
- Learn how to use Couchbase Capella's AI Services auto-vectorization feature to automatically convert your data into vector embeddings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert your unstructured data into vector embeddings

- This tutorial demonstrates how to set up automated embedding generation workflows and perform semantic search using LangChain.
content_type: tutorial
filter: sdk
technology:
- vector search
tags:
- LangChain
sdk_language:
- python
length: 20 Mins
---
Binary file added autovec_unstructured/img/S3bucketsuccess.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/S3credentials.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/addS3bucket.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/data_processing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/deploying_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/importing_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/model_api_key_form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/model_setup_access.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/start_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/workflow_deployed.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/img/workflow_summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec_unstructured/sample.pdf
Binary file not shown.