Skip to content

Commit f44ca91

Browse files
authored
Integration test reorg and build sharding (#540)
## Summary This PR implements a custom pytest plugin for test sharding, allowing tests to be automatically distributed across multiple CI jobs for parallel execution. This replaces the previous manual directory-based test splitting approach with a more flexible, hash-based distribution system. Additionally, integration tests have been reorganized into top-level folders that group tests by the client type and setup requirements needed. This should bring down the total CI runtime to ~8 minutes or less. ## Changes ### Core Implementation - **New pytest plugin** (`tests/pytest_shard.py`): - Implements `pytest_addoption` hook to add `--splits` and `--group` command-line options - Implements `pytest_collection_modifyitems` hook to filter tests based on shard assignment - Uses hash-based distribution (MD5 hash of test node ID) for deterministic test assignment - Supports environment variables `PYTEST_SPLITS` and `PYTEST_GROUP` as alternatives to command-line options - Includes validation for shard parameters with helpful error messages - **Plugin registration** (`tests/conftest.py`): - Registers the plugin globally so it's available for all test runs - Plugin is automatically loaded when running pytest ### CI Workflow Updates - **Updated `.github/workflows/testing-integration.yaml`**: - Replaced manual directory-based test splitting with automatic sharding - `rest_sync` tests: Now uses 8 shards (was manually split by directory) - `rest_asyncio` tests: Now uses 5 shards (was manually split by directory) - `grpc` tests: No sharding (runs all tests in single job, including `tests/integration/rest_sync/db/data` with `USE_GRPC='true'`) - **Updated `.github/actions/run-integration-test/action.yaml`**: - Added `pytest_splits` and `pytest_group` input parameters - Updated test execution to pass sharding arguments when provided ### Test Reorganization - **Integration tests reorganized by client type** (`tests/integration/`): - **`rest_sync/`**: Tests using the synchronous REST client (`Pinecone()`) - Uses standard `Index()` objects for database operations - Supports optional GRPC mode via `USE_GRPC='true'` environment variable - Contains subdirectories for `db/` (control and data operations), `inference/`, and `admin/` tests - **`rest_asyncio/`**: Tests using the asynchronous REST client (`Pinecone().IndexAsyncio()`) - Uses async fixtures and `IndexAsyncio()` objects - Requires `pytest-asyncio` for async test execution - Contains subdirectories for `db/` (control and data operations) and `inference/` tests - **`grpc/`**: Tests using the GRPC client (`PineconeGRPC()`) - Uses `PineconeGRPC()` client and `GRPCIndex` objects - Contains `db/data/` tests for GRPC-specific functionality - This organization makes it clear which client type each test requires and simplifies fixture setup ### Bug Fixes - **Fixed race condition in test cleanup** (`tests/integration/rest_sync/db/control/pod/conftest.py`): - Added `NotFoundException` handling in `attempt_delete_index` function - Prevents teardown errors when index is deleted between `has_index` check and `describe_index` call ### Testing - **Unit tests** (`tests/unit/test_pytest_shard.py`): - Tests for hash-based distribution logic - Tests for validation and error handling - Tests for deterministic shard assignment - Tests for edge cases (single shard, environment variables, etc.) - Tests gracefully handle `testdir` limitations (plugin loading in isolated environments) ### Documentation - **Updated `docs/maintainers/testing-guide.md`**: - Added "Test Sharding" section with usage examples - Documented command-line options and environment variables - Explained how sharding works and its use in CI - Documented actual shard counts used in CI workflows - Fixed broken link to `testing-integration.yaml` ## Benefits 1. **Automatic test distribution**: Tests are automatically distributed across shards using a deterministic hash algorithm, eliminating the need to manually maintain directory-based splits 2. **Better load balancing**: Hash-based distribution ensures more even test distribution across shards compared to directory-based splitting 3. **Easier maintenance**: No need to manually update CI workflows when test files are added, removed, or reorganized 4. **Flexibility**: Shard counts can be easily adjusted in CI workflows without code changes 5. **Deterministic**: Same test always goes to the same shard, making debugging easier 6. **Clear test organization**: Tests are grouped by client type, making it immediately clear which setup and fixtures are needed for each test 7. **Simplified fixture management**: Each client type has its own `conftest.py` with appropriate fixtures, reducing complexity and potential conflicts ## Usage ### Command-line ```sh pytest tests/integration/rest_sync --splits=8 --group=1 ``` ### Environment variables ```sh export PYTEST_SPLITS=8 export PYTEST_GROUP=1 pytest tests/integration/rest_sync ``` ## Testing - Plugin works correctly in real pytest environment - CI workflows updated and ready for use ## Notes - The plugin is automatically available when running pytest (no installation needed) - Shard counts in CI can be adjusted based on test suite size and CI capacity
1 parent d8d68bf commit f44ca91

File tree

199 files changed

+7020
-1784
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

199 files changed

+7020
-1784
lines changed

.durations_grpc

Lines changed: 3421 additions & 0 deletions
Large diffs are not rendered by default.

.durations_rest_asyncio

Lines changed: 167 additions & 0 deletions
Large diffs are not rendered by default.

.durations_rest_sync

Lines changed: 301 additions & 0 deletions
Large diffs are not rendered by default.

.github/actions/index-create/action.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,15 @@ inputs:
1717
dimension:
1818
description: 'The dimension of the index'
1919
required: false
20-
default: '3'
20+
default: ''
2121
metric:
2222
description: 'The metric of the index'
2323
required: false
2424
default: 'cosine'
25+
vector_type:
26+
description: 'The type of the index'
27+
required: false
28+
default: 'dense'
2529
PINECONE_API_KEY:
2630
description: 'The Pinecone API key'
2731
required: true
@@ -36,6 +40,10 @@ outputs:
3640
description: 'The name of the index, including randomized suffix'
3741
value: ${{ steps.create-index.outputs.index_name }}
3842

43+
index_host:
44+
description: 'The host of the index'
45+
value: ${{ steps.create-index.outputs.index_host }}
46+
3947
runs:
4048
using: 'composite'
4149
steps:
@@ -52,5 +60,6 @@ runs:
5260
NAME_PREFIX: ${{ inputs.name_prefix }}
5361
REGION: ${{ inputs.region }}
5462
CLOUD: ${{ inputs.cloud }}
63+
VECTOR_TYPE: ${{ inputs.vector_type }}
5564
DIMENSION: ${{ inputs.dimension }}
5665
METRIC: ${{ inputs.metric }}

.github/actions/index-create/create.py

Lines changed: 30 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
import os
2-
import re
32
import random
43
import string
5-
from datetime import datetime
4+
import uuid
65
from pinecone import Pinecone
6+
from datetime import datetime
77

88

99
def read_env_var(name):
@@ -22,39 +22,9 @@ def write_gh_output(name, value):
2222
print(f"{name}={value}", file=fh)
2323

2424

25-
def generate_index_name(test_name: str) -> str:
26-
github_actor = os.getenv("GITHUB_ACTOR", None)
27-
user = os.getenv("USER", None)
28-
index_owner = github_actor or user
29-
30-
formatted_date = datetime.now().strftime("%Y%m%d-%H%M%S%f")[:-3]
31-
32-
github_job = os.getenv("GITHUB_JOB", None)
33-
34-
if test_name.startswith("test_"):
35-
test_name = test_name[5:]
36-
37-
# Remove trailing underscore, if any
38-
if test_name.endswith("_"):
39-
test_name = test_name[:-1]
40-
41-
name_parts = [index_owner, formatted_date, github_job, test_name]
42-
index_name = "-".join([x for x in name_parts if x is not None])
43-
44-
# Remove invalid characters
45-
replace_with_hyphen = re.compile(r"[\[\(_,\s]")
46-
index_name = re.sub(replace_with_hyphen, "-", index_name)
47-
replace_with_empty = re.compile(r"[\]\)\.]")
48-
index_name = re.sub(replace_with_empty, "", index_name)
49-
50-
max_length = 45
51-
index_name = index_name[:max_length]
52-
53-
# Trim final character if it is not alphanumeric
54-
if index_name.endswith("_") or index_name.endswith("-"):
55-
index_name = index_name[:-1]
56-
57-
return index_name.lower()
25+
def generate_index_name(name_prefix: str) -> str:
26+
name = name_prefix.lower() + "-" + str(uuid.uuid4())
27+
return name[:45]
5828

5929

6030
def get_tags():
@@ -74,15 +44,35 @@ def get_tags():
7444

7545
def main():
7646
pc = Pinecone(api_key=read_env_var("PINECONE_API_KEY"))
77-
index_name = generate_index_name(read_env_var("NAME_PREFIX") + random_string(20))
47+
index_name = generate_index_name(read_env_var("NAME_PREFIX"))
48+
dimension_var = read_env_var("DIMENSION")
49+
if dimension_var is not None and dimension_var != "":
50+
dimension = int(dimension_var)
51+
else:
52+
dimension = None
53+
54+
vector_type_var = read_env_var("VECTOR_TYPE")
55+
if vector_type_var is not None and vector_type_var != "":
56+
vector_type = vector_type_var
57+
else:
58+
vector_type = None
59+
60+
metric = read_env_var("METRIC")
61+
cloud = read_env_var("CLOUD")
62+
region = read_env_var("REGION")
63+
tags = get_tags()
64+
7865
pc.create_index(
7966
name=index_name,
80-
metric=read_env_var("METRIC"),
81-
dimension=int(read_env_var("DIMENSION")),
82-
spec={"serverless": {"cloud": read_env_var("CLOUD"), "region": read_env_var("REGION")}},
83-
tags=get_tags(),
67+
metric=metric,
68+
dimension=dimension,
69+
vector_type=vector_type,
70+
tags=tags,
71+
spec={"serverless": {"cloud": cloud, "region": region}},
8472
)
73+
description = pc.describe_index(name=index_name)
8574
write_gh_output("index_name", index_name)
75+
write_gh_output("index_host", description.host)
8676

8777

8878
if __name__ == "__main__":

.github/actions/run-integration-test/action.yaml

Lines changed: 35 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,29 @@ inputs:
1414
PINECONE_ADDITIONAL_HEADERS:
1515
description: 'Additional headers to send with the request'
1616
required: false
17-
default: '{"sdk-test-suite": "pinecone-python-client"}'
17+
default: '{"sdk-test-suite": "pinecone-python-client", "x-environment": "preprod-aws-0"}'
1818
use_grpc:
1919
description: 'Whether to use gRPC or REST'
2020
required: false
2121
default: 'false'
22+
PINECONE_CLIENT_ID:
23+
description: 'The client ID to use for admin tests'
24+
required: false
25+
PINECONE_CLIENT_SECRET:
26+
description: 'The client secret to use for admin tests'
27+
required: false
28+
INDEX_HOST_DENSE:
29+
description: 'The host of the dense index for db data tests'
30+
required: false
31+
INDEX_HOST_SPARSE:
32+
description: 'The host of the sparse index for db data tests'
33+
required: false
34+
pytest_splits:
35+
description: 'Number of shards to split tests into (for test sharding)'
36+
required: false
37+
pytest_group:
38+
description: 'Which shard to run (1-indexed, for test sharding)'
39+
required: false
2240

2341
runs:
2442
using: 'composite'
@@ -33,9 +51,23 @@ runs:
3351
- name: Run tests
3452
id: run-tests
3553
shell: bash
36-
run: poetry run pytest tests/integration/${{ inputs.test_suite }} --retries 2 --retry-delay 35 -s -vv --log-cli-level=DEBUG --durations=20
54+
run: |
55+
PYTEST_ARGS=""
56+
if [ -n "${{ inputs.pytest_splits }}" ] && [ -n "${{ inputs.pytest_group }}" ]; then
57+
PYTEST_ARGS="--splits=${{ inputs.pytest_splits }} --group=${{ inputs.pytest_group }}"
58+
fi
59+
poetry run pytest ${{ inputs.test_suite }} \
60+
$PYTEST_ARGS \
61+
--retries 2 \
62+
--retry-delay 35 \
63+
--log-cli-level=DEBUG \
64+
--durations=25 \
65+
-s -vv
3766
env:
3867
PINECONE_API_KEY: ${{ steps.decrypt-api-key.outputs.decrypted_secret }}
3968
PINECONE_ADDITIONAL_HEADERS: ${{ inputs.PINECONE_ADDITIONAL_HEADERS }}
69+
PINECONE_CLIENT_ID: ${{ inputs.PINECONE_CLIENT_ID }}
70+
PINECONE_CLIENT_SECRET: ${{ inputs.PINECONE_CLIENT_SECRET }}
4071
USE_GRPC: ${{ inputs.use_grpc }}
41-
SKIP_WEIRD: 'true'
72+
INDEX_HOST_DENSE: ${{ inputs.INDEX_HOST_DENSE }}
73+
INDEX_HOST_SPARSE: ${{ inputs.INDEX_HOST_SPARSE }}

.github/actions/setup-poetry/action.yml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ inputs:
2121
description: 'Python version to use'
2222
required: true
2323
default: '3.10'
24+
enable_cache:
25+
description: 'Enable caching of Poetry dependencies and virtual environment'
26+
required: true
27+
default: 'true'
2428

2529
runs:
2630
using: 'composite'
@@ -33,6 +37,21 @@ runs:
3337
- name: Install Poetry
3438
uses: snok/install-poetry@v1
3539

40+
- name: Get Poetry cache directory
41+
if: ${{ inputs.enable_cache == 'true' }}
42+
id: poetry-cache
43+
shell: bash
44+
run: |
45+
echo "dir=$(poetry config cache-dir)" >> $GITHUB_OUTPUT
46+
47+
- name: Cache Poetry dependencies
48+
if: ${{ inputs.enable_cache == 'true' }}
49+
uses: actions/cache@v4
50+
id: restore-cache-poetry
51+
with:
52+
path: ${{ steps.poetry-cache.outputs.dir }}
53+
key: poetry-${{ runner.os }}-${{ inputs.python_version }}-${{ hashFiles('poetry.lock') }}-grpc-${{ inputs.include_grpc }}-asyncio-${{ inputs.include_asyncio }}-dev-${{ inputs.include_dev }}-types-${{ inputs.include_types }}
54+
3655
- name: Install dependencies
3756
shell: bash
3857
env:

.github/actions/test-dependency-asyncio-rest/action.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ runs:
3030
include_types: false
3131
include_asyncio: true
3232
python_version: ${{ inputs.python_version }}
33+
enable_cache: 'false'
3334

3435
- name: 'Install aiohttp ${{ inputs.aiohttp_version }}'
3536
run: 'poetry add aiohttp==${{ inputs.aiohttp_version }}'

.github/actions/test-dependency-grpc/action.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ runs:
3838
include_grpc: true
3939
include_types: false
4040
python_version: ${{ inputs.python_version }}
41+
enable_cache: 'false'
4142

4243
- name: Install grpcio ${{ inputs.grpcio_version }}
4344
run: poetry add grpcio==${{ inputs.grpcio_version }}

.github/actions/test-dependency-rest/action.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ runs:
2929
include_grpc: false
3030
include_types: false
3131
python_version: ${{ inputs.python_version }}
32+
enable_cache: 'false'
3233

3334
- name: 'Install urllib3 ${{ matrix.urllib3-version }}'
3435
run: 'poetry add urllib3==${{ matrix.urllib3-version }}'

0 commit comments

Comments
 (0)