Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
5b6fa54
Naming things: Change name in LICENSE file
amotl Dec 13, 2024
0e4386e
Naming things: Update repository URL
amotl Dec 13, 2024
db8f781
Tests: Fix invocation by explicitly pinning version of langchain-tests
amotl Dec 13, 2024
bc09e18
CI: Add GitHub workflow definitions from `langchain-mongodb`
amotl Dec 13, 2024
fd15dcd
Chore: Format code using Ruff
amotl Dec 13, 2024
7fca60c
Tests: Temporary deactivate integration tests
amotl Dec 13, 2024
047fc32
Tests: Update to langchain-tests 0.3.7
amotl Dec 13, 2024
8fd9907
Vector Store: Add CrateDBVectorStore
amotl Dec 14, 2024
77b0f60
Vector Store: Import software test cases from original patch
amotl Dec 14, 2024
16f3b61
Vector Store: Add CrateDBVectorStoreMultiCollection, with software tests
amotl Dec 14, 2024
57a1040
Vector Store: Add SQL-based metadata filtering using CrateDB's OBJECT
amotl Dec 14, 2024
3c45c49
Sandbox: Document how to invoke software tests
amotl Dec 14, 2024
7c9a93b
CrateDB: Remove more details about JSONB
amotl Dec 14, 2024
59b43d3
Tests: Import `_compare_documents` function from `langchain-postgres`
amotl Dec 14, 2024
44076ca
Tests: Configure two flaky tests for re-running
amotl Dec 14, 2024
9ed2dab
Chore: This and that
amotl Dec 14, 2024
72579f1
Documentation: Update README. Add Changelog.
amotl Dec 14, 2024
dbb8f40
Conversational Memory: Add adapter for chat message history
amotl Dec 15, 2024
f2cb800
Document Loader: Add adapter for loading data from database (CrateDB)
amotl Dec 15, 2024
1aec844
Naming things: Use shorter names `chat_history` and `loaders`
amotl Dec 15, 2024
5fe8c9c
Documentation: Update Jupyter Notebooks to refer to downstream docs
amotl Dec 15, 2024
c6503b1
Retriever: Deactivate, and refer to `vectorstore.as_retriever()`
amotl Dec 15, 2024
22d93b3
Documentation: Update README
amotl Dec 15, 2024
cfc9236
Tests: Add code coverage reporting
amotl Dec 15, 2024
713254a
CI: Upload code coverage report to Codecov.io
amotl Dec 15, 2024
b426eb7
CI: Enable Dependabot
amotl Dec 15, 2024
018d16c
Documentation: Improve development sandbox notes
amotl Dec 15, 2024
dcfd0ca
Dependencies: Use version ranges, focusing on the upper bound
amotl Dec 15, 2024
2ae5dc3
Project: Add project metadata to `pyproject.toml`
amotl Dec 15, 2024
43fdc7e
CI: Use Python 3.13
amotl Dec 15, 2024
73da7ec
CI: Run workflow each night, after CrateDB Nightly has been published
amotl Dec 15, 2024
56bc1db
CI: Improve workflow labels (cosmetics)
amotl Dec 15, 2024
dd82356
CI: Don't skip workflow runs for Dependabot
amotl Dec 15, 2024
9423038
Project: Don't use `poetry.lock`, because it's a library
amotl Dec 15, 2024
05ef170
Bump mypy from 1.10.1 to 1.13.0
dependabot[bot] Dec 16, 2024
9528089
Bump ruff from 0.5.7 to 0.8.3
dependabot[bot] Dec 16, 2024
30ca140
Documentation: Minor updates to README.md, DEVELOP.md, and CHANGES.md
amotl Dec 18, 2024
b7ae4c7
Documentation: Add `backlog.md`
amotl Dec 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions .github/actions/poetry_setup/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# An action for setting up poetry install with caching.
# Using a custom action since the default action does not
# take poetry install groups into account.
# Action code from:
# https://github.com/actions/setup-python/issues/505#issuecomment-1273013236
name: poetry-install-with-caching
description: Poetry install with support for caching of dependency groups.

inputs:
python-version:
description: Python version, supporting MAJOR.MINOR only
required: true

poetry-version:
description: Poetry version
required: true

cache-key:
description: Cache key to use for manual handling of caching
required: true

working-directory:
description: Directory whose poetry.lock file should be cached
required: true

runs:
using: composite
steps:
- uses: actions/setup-python@v5
name: Setup python ${{ inputs.python-version }}
id: setup-python
with:
python-version: ${{ inputs.python-version }}

- uses: actions/cache@v4
id: cache-bin-poetry
name: Cache Poetry binary - Python ${{ inputs.python-version }}
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "1"
with:
path: |
/opt/pipx/venvs/poetry
# This step caches the poetry installation, so make sure it's keyed on the poetry version as well.
key: bin-poetry-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-${{ inputs.poetry-version }}

- name: Refresh shell hashtable and fixup softlinks
if: steps.cache-bin-poetry.outputs.cache-hit == 'true'
shell: bash
env:
POETRY_VERSION: ${{ inputs.poetry-version }}
PYTHON_VERSION: ${{ inputs.python-version }}
run: |
set -eux

# Refresh the shell hashtable, to ensure correct `which` output.
hash -r

# `actions/cache@v3` doesn't always seem able to correctly unpack softlinks.
# Delete and recreate the softlinks pipx expects to have.
rm /opt/pipx/venvs/poetry/bin/python
cd /opt/pipx/venvs/poetry/bin
ln -s "$(which "python$PYTHON_VERSION")" python
chmod +x python
cd /opt/pipx_bin/
ln -s /opt/pipx/venvs/poetry/bin/poetry poetry
chmod +x poetry

# Ensure everything got set up correctly.
/opt/pipx/venvs/poetry/bin/python --version
/opt/pipx_bin/poetry --version

- name: Install poetry
if: steps.cache-bin-poetry.outputs.cache-hit != 'true'
shell: bash
env:
POETRY_VERSION: ${{ inputs.poetry-version }}
PYTHON_VERSION: ${{ inputs.python-version }}
# Install poetry using the python version installed by setup-python step.
run: pipx install "poetry==$POETRY_VERSION" --python '${{ steps.setup-python.outputs.python-path }}' --verbose

- name: Restore pip and poetry cached dependencies
uses: actions/cache@v4
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "4"
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}
with:
path: |
~/.cache/pip
~/.cache/pypoetry/virtualenvs
~/.cache/pypoetry/cache
~/.cache/pypoetry/artifacts
${{ env.WORKDIR }}/.venv
key: py-deps-${{ runner.os }}-${{ runner.arch }}-py-${{ inputs.python-version }}-poetry-${{ inputs.poetry-version }}-${{ inputs.cache-key }}-${{ hashFiles(format('{0}/**/poetry.lock', env.WORKDIR)) }}
12 changes: 12 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
version: 2

updates:
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "daily"

- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "monthly"
52 changes: 52 additions & 0 deletions .github/scripts/check_diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import json
import sys
from typing import Dict

LIB_DIRS = ["."]

if __name__ == "__main__":
files = sys.argv[1:] # changed files

dirs_to_run: Dict[str, set] = {
"lint": set(),
"test": set(),
}

if len(files) == 300:
# max diff length is 300 files - there are likely files missing
raise ValueError("Max diff reached. Please manually run CI on changed libs.")

for file in files:
if any(
file.startswith(dir_)
for dir_ in (
"pyproject.toml",
"poetry.lock",
".github/workflows",
".github/tools",
".github/actions",
".github/scripts/check_diff.py",
)
):
# add all LIB_DIRS for infra changes
dirs_to_run["test"].update(LIB_DIRS)
dirs_to_run["lint"].update(LIB_DIRS)

if any(file.startswith(dir_) for dir_ in LIB_DIRS):
for dir_ in LIB_DIRS:
if file.startswith(dir_):
dirs_to_run["test"].add(dir_)
dirs_to_run["lint"].add(dir_)
elif file.startswith("libs/"):
raise ValueError(
f"Unknown lib: {file}. check_diff.py likely needs "
"an update for this new library!"
)

outputs = {
"dirs-to-lint": list(dirs_to_run["lint"]),
"dirs-to-test": list(dirs_to_run["test"]),
}
for key, value in outputs.items():
json_output = json.dumps(value)
print(f"{key}={json_output}") # noqa: T201
10 changes: 10 additions & 0 deletions .github/scripts/extract_ignored_words_list.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import toml

pyproject_toml = toml.load("pyproject.toml")

# Extract the ignore words list (adjust the key as per your TOML structure)
ignore_words_list = (
pyproject_toml.get("tool", {}).get("codespell", {}).get("ignore-words-list")
)

print(f"::set-output name=ignore_words_list::{ignore_words_list}") # noqa: T201
65 changes: 65 additions & 0 deletions .github/scripts/get_min_versions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import re
import sys

import tomllib
from packaging.version import parse as parse_version

MIN_VERSION_LIBS = ["langchain-core"]


def get_min_version(version: str) -> str:
# case ^x.x.x
_match = re.match(r"^\^(\d+(?:\.\d+){0,2})$", version)
if _match:
return _match.group(1)

# case >=x.x.x,<y.y.y
_match = re.match(r"^>=(\d+(?:\.\d+){0,2}),<(\d+(?:\.\d+){0,2})$", version)
if _match:
_min = _match.group(1)
_max = _match.group(2)
assert parse_version(_min) < parse_version(_max)
return _min

# case x.x.x
_match = re.match(r"^(\d+(?:\.\d+){0,2})$", version)
if _match:
return _match.group(1)

raise ValueError(f"Unrecognized version format: {version}")


def get_min_version_from_toml(toml_path: str):
# Parse the TOML file
with open(toml_path, "rb") as file:
toml_data = tomllib.load(file)

# Get the dependencies from tool.poetry.dependencies
dependencies = toml_data["tool"]["poetry"]["dependencies"]

# Initialize a dictionary to store the minimum versions
min_versions = {}

# Iterate over the libs in MIN_VERSION_LIBS
for lib in MIN_VERSION_LIBS:
# Check if the lib is present in the dependencies
if lib in dependencies:
# Get the version string
version_string = dependencies[lib]

# Use parse_version to get the minimum supported version from version_string
min_version = get_min_version(version_string)

# Store the minimum version in the min_versions dictionary
min_versions[lib] = min_version

return min_versions


# Get the TOML file path from the command line argument
toml_file = sys.argv[1]

# Call the function to get the minimum versions
min_versions = get_min_version_from_toml(toml_file)

print(" ".join([f"{lib}=={version}" for lib, version in min_versions.items()])) # noqa: T201
39 changes: 39 additions & 0 deletions .github/workflows/_codespell.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
name: make spell_check

on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"

permissions:
contents: read

jobs:
codespell:
name: (Check for spelling errors)
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Install Dependencies
run: |
pip install toml

- name: Extract Ignore Words List
working-directory: ${{ inputs.working-directory }}
run: |
# Use a Python script to extract the ignore words list from pyproject.toml
python ../../.github/scripts/extract_ignored_words_list.py
id: extract_ignore_words

- name: Codespell
uses: codespell-project/actions-codespell@v2
with:
skip: guide_imports.json
ignore_words_list: ${{ steps.extract_ignore_words.outputs.ignore_words_list }}
97 changes: 97 additions & 0 deletions .github/workflows/_lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
name: lint

on:
workflow_call:
inputs:
working-directory:
required: true
type: string
description: "From which folder this pipeline executes"

env:
POETRY_VERSION: "1.7.1"
WORKDIR: ${{ inputs.working-directory == '' && '.' || inputs.working-directory }}

# This env var allows us to get inline annotations when ruff has complaints.
RUFF_OUTPUT_FORMAT: github

jobs:
build:
name: "make lint #${{ matrix.python-version }}"
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# Only lint on the min and max supported Python versions.
# It's extremely unlikely that there's a lint issue on any version in between
# that doesn't show up on the min or max versions.
#
# GitHub rate-limits how many jobs can be running at any one time.
# Starting new jobs is also relatively slow,
# so linting on fewer versions makes CI faster.
python-version:
- "3.9"
- "3.13"
steps:
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
uses: "./.github/actions/poetry_setup"
with:
python-version: ${{ matrix.python-version }}
poetry-version: ${{ env.POETRY_VERSION }}
working-directory: ${{ inputs.working-directory }}
cache-key: lint-with-extras

- name: Check Poetry File
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
poetry check

- name: Install dependencies
# Also installs dev/lint/test/typing dependencies, to ensure we have
# type hints for as many of our libraries as possible.
# This helps catch errors that require dependencies to be spotted, for example:
# https://github.com/langchain-ai/langchain/pull/10249/files#diff-935185cd488d015f026dcd9e19616ff62863e8cde8c0bee70318d3ccbca98341
#
# If you change this configuration, make sure to change the `cache-key`
# in the `poetry_setup` action above to stop using the old cache.
# It doesn't matter how you change it, any change will cause a cache-bust.
working-directory: ${{ inputs.working-directory }}
run: |
poetry install --with lint,typing

- name: Get .mypy_cache to speed up mypy
uses: actions/cache@v4
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
with:
path: |
${{ env.WORKDIR }}/.mypy_cache
key: mypy-lint-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}


- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}
run: |
make lint_package

- name: Install unit+integration test dependencies
working-directory: ${{ inputs.working-directory }}
run: |
poetry install --with test,test_integration

- name: Get .mypy_cache_test to speed up mypy
uses: actions/cache@v4
env:
SEGMENT_DOWNLOAD_TIMEOUT_MIN: "2"
with:
path: |
${{ env.WORKDIR }}/.mypy_cache_test
key: mypy-test-${{ runner.os }}-${{ runner.arch }}-py${{ matrix.python-version }}-${{ inputs.working-directory }}-${{ hashFiles(format('{0}/poetry.lock', inputs.working-directory)) }}

- name: Analysing the code with our lint
working-directory: ${{ inputs.working-directory }}
run: |
make lint_tests
Loading
Loading