chore: align Confluence parameters spec with CQL support #3

robodev-r2d2 · 2025-10-24T08:10:43Z

Summary

add an explicit Confluence parameters schema to the extractor OpenAPI spec so CQL support is reflected in generated clients
document the Confluence loader options, including the optional CQL filter, in the libraries README

Testing

pytest libs/extractor-api-lib/tests -k confluence (fails: ModuleNotFoundError: No module named 'langchain_core')

https://chatgpt.com/codex/tasks/task_e_68f3a27830648326835fe507c2685ad7

… paths in Tiltfile (stackitcloud#143) Summary: This PR resolves Linux dev environment issues by standardizing the Poetry virtualenv location and ensuring dev dependencies are reliably installed when building with dev=1. Adjust tiltfile, so that tilt performs clean reloads, when changes are triggered. Changes: - Standardize Poetry virtualenv to /opt/.venv across services. - Set POETRY_VIRTUALENVS_CREATE=false and POETRY_VIRTUALENVS_IN_PROJECT=false to reuse the prebuilt venv. - Export VIRTUAL_ENV and prepend /opt/.venv/bin to PATH in both build and runtime stages, including for nonroot. - Add cache-busting tied to the dev build arg to force correct installation of dev dependencies. - Clean up redundant PATH exports and ensure /etc/environment reflects the unified venv path. - Adjust tiltfile sync and ignore during image build Scope: - services/admin-backend/Dockerfile - services/document-extractor/Dockerfile - services/mcp-server/Dockerfile - services/rag-backend/Dockerfile Fixes: stackitcloud#142 --------- Co-authored-by: Andreas Klos <[email protected]> Co-authored-by: Andreas Klos <[email protected]>

Add copilot instructions file for customized and better copilot generation. --------- Co-authored-by: Andreas Klos <[email protected]>

Add codex instructions file for customized and better codex generation.

Adjust the rephrasing chain prompt, increase fault tolerance and adjust chat graph connections, so that the nodes are executed sequentially. Adjust determine language node in answer graph. Its now based on llms and has as fallback langdetect and as fallback from langdetect, 'en'.

…elm + deps updates (stackitcloud#148) Summary - Adds an optional Semantic Chunker to the admin-api-lib and centralizes embedding implementations in rag-core-lib (rag-core-api now re-exports). - Helm chart gains chunker selection + tuning; admin container now preloads NLTK data at startup. - Dependency updates across admin libs/services; new tests for chunking logic. Motivation - Provide more accurate chunk boundaries (semantic-aware) while retaining the existing recursive splitter as the default. - Deduplicate/embedder logic across projects to reduce drift and config duplication. Key changes - Admin chunking - New `SemanticTextChunker` backed by LangChain’s `SemanticChunker`, with optional min/max enforcement via `RecursiveCharacterTextSplitter`. - Trailing undersized chunks are sentence-aware rebalanced (NLTK Punkt with regex fallback) to avoid tiny tails. - Configurable via: - `CHUNKER_CLASS_TYPE_CHUNKER_TYPE`: `recursive` (default) or `semantic` - `CHUNKER_MAX_SIZE` (default `1000`), `CHUNKER_OVERLAP` (default `100`) - Semantic-only: `CHUNKER_BREAKPOINT_THRESHOLD_TYPE` (default `percentile`), `CHUNKER_BREAKPOINT_THRESHOLD_AMOUNT` (default `95`), `CHUNKER_BUFFER_SIZE` (default `1`), `CHUNKER_MIN_SIZE` (default `200`) - DI wiring - `DependencyContainer` selects chunker (`recursive` or `semantic`) and, for semantic mode, resolves embeddings via `EmbedderClassTypeSettings`: - `stackit` → `StackitEmbedder` (with shared retry settings) - `ollama` → `LangchainCommunityEmbedder(OllamaEmbeddings)` - Container bootstrapping simplified in `main.py` (internalizes class-type wiring). - Embeddings centralization - New in `rag-core-lib`: `impl/embeddings/*` and embedder settings (`stackit`, `ollama`, `fake`), plus `EmbedderType` and base `Embedder`. - `rag-core-api` re-exports these for backward compatibility (no breaking imports). - Helm / deployment - Values (`infrastructure/rag/values.yaml`): new `adminBackend.envs.chunker.*` keys for selection & tuning (chart default `recursive`; overlap default now `100`). - Deployment: mounts NLTK data dir and fetches `punkt` + `averaged_perceptron_tagger_eng` at startup; adds `configmap.chunkerName` and `secret.stackitEmbedderName` to env sources. - Behavior fixes & docs - De-duplicate `meta["related"]` in page summaries. - Docs: libs README adds “Chunker configuration (multiple chunkers)” and updates DI tables to rag-core-lib classes; admin-backend README adds “Chunking modes”. - Tests - New `semantic_text_chunker_test.py` exercising: supported-kwargs passthrough to LC chunker, empty-input behavior, min/max enforcement + balancing, sentence-aware split. Configuration / migration - Default remains `recursive` splitter; to enable semantic chunking: 1) Set `CHUNKER_CLASS_TYPE_CHUNKER_TYPE=semantic`. 2) Choose embeddings via `EMBEDDER_CLASS_TYPE_EMBEDDER_TYPE` (`stackit` or `ollama`) and configure: - STACKIT: `STACKIT_EMBEDDER_MODEL`, `STACKIT_EMBEDDER_BASE_URL`, `STACKIT_EMBEDDER_API_KEY` (+ optional retry overrides). - Ollama: `OLLAMA_EMBEDDER_MODEL`, `OLLAMA_EMBEDDER_BASE_URL`. 3) Ensure Helm chart has corresponding ConfigMaps/Secrets (`stackitEmbedder`, etc.). - NLTK data is preloaded on container start; no runtime downloads required. Dependencies - Add: `langchain-experimental`, `nltk` (and transitive `joblib`). - Bump: `fastapi` (0.118.x), `uvicorn` (0.37.x), `langfuse` (3.6.x), `langchain`/`community`/`core` minor versions, `requests` (2.32.5). - Test note: ensure LC packages (`langchain_core`, etc.) are present to run unit tests locally. Risks & mitigations - Startup time increases slightly due to NLTK data fetch → mitigated via one-time download into an emptyDir. - Semantic mode depends on external embeddings; ensure credentials/secrets are present before switching default. - Chunk size tuning may affect vector DB costs; start with defaults and adjust based on retrieval quality. Docs - libs/README.md: “2.4 Chunker configuration (multiple chunkers)” and corrected DI references. - services/admin-backend/README.md: “Chunking modes” and Helm guidance.

…icated documentaion for each lib (stackitcloud#151) This pull request introduces major improvements to documentation, metadata, and configuration for the three main Python libraries in the STACKIT RAG template: `admin-api-lib`, `extractor-api-lib`, and `rag-core-api`. The changes focus on adding comprehensive README files for each library, updating package metadata in `pyproject.toml` for clarity and compliance, and refining dependency and configuration management. These updates make the libraries easier to understand, install, and extend, and improve maintainability for both operators and developers. **Documentation enhancements:** * Added detailed `README.md` files for `libs/admin-api-lib`, `libs/extractor-api-lib`, and `libs/rag-core-api`, describing module responsibilities, features, endpoints, configuration, usage, extension, and contribution guidelines. [[1]](diffhunk://#diff-0064014deac3d21031c406697c008f92f0bb2783aa7eaaaf264a2345eea2cc9eR1-R96) [[2]](diffhunk://#diff-9879d55539dbabcfd9190ec32b1828dfe5874d5e40d32816db8208de3aeeed1aR1-R94) [[3]](diffhunk://#diff-eb80132f5f4660c40ce8a60f375daec36d19a5e070d120a478f60d74384183d9R1-R96) **Package metadata and configuration improvements:** * Updated `pyproject.toml` for all three libraries to include new version numbers (`v3.2.1`), expanded author and maintainer information, license, repository, homepage, and readme fields for better package distribution and compliance. [[1]](diffhunk://#diff-9c5aeb0db77c2eec077d07ddc3b3810ae1a4a1e50ee7061fba37a46706c513fbL7-R19) [[2]](diffhunk://#diff-dede389bcfb615c4b45cd1da7ac14cbe9535305f41f19cce09e321c91a8bb323L7-R19) [[3]](diffhunk://#diff-9c4162cc1c16dd4c7ec5e95e79df285e8c0882a1db7ff2892c746a0537d26c36L7-R19) * Improved dependency specification in `libs/extractor-api-lib/pyproject.toml` by switching `fasttext` to a stable PyPI version and adjusting other package versions. * Refined pytest and flake8 configuration for consistency and clarity, such as changing `log_cli` to boolean and updating exclusions. [[1]](diffhunk://#diff-dede389bcfb615c4b45cd1da7ac14cbe9535305f41f19cce09e321c91a8bb323L139-R148) [[2]](diffhunk://#diff-9c5aeb0db77c2eec077d07ddc3b3810ae1a4a1e50ee7061fba37a46706c513fbL7-R19) These changes collectively strengthen the documentation, usability, and maintainability of the STACKIT RAG template libraries, making them more accessible for new users and contributors. --------- Co-authored-by: Copilot <[email protected]>

stackitcloud#152) This pull request primarily updates version numbers and metadata across multiple components to align with the latest release (3.2.1) and standardize package naming and licensing. Additionally, it enhances the documentation by adding useful badges to the `README.md` for better project visibility. **Version and metadata updates:** * Updated the version to `3.2.1` in `services/frontend/package.json`, `services/admin-backend/pyproject.toml`, `services/document-extractor/pyproject.toml`, `services/mcp-server/pyproject.toml`, and `services/rag-backend/pyproject.toml` to ensure consistency across all services. [[1]](diffhunk://#diff-0d005dbd9d9f66983f95fa01fa375184cf69dac9ae841050c11f07ebcc6789fdL3-R5) [[2]](diffhunk://#diff-7be99b3586ebefbb9757532b67d9bd826779bfe12db834326790c00f868238e7L55-R55) [[3]](diffhunk://#diff-bda9860363f25ca7829f0bc0121455b5cfea15f6ecc4e98d168aba411d9653c9L47-R47) [[4]](diffhunk://#diff-a32cd883126f65652f92c8ecc411d949b7bcf95edb2156c36dc2c1b7063ee690L3-R3) [[5]](diffhunk://#diff-575f4ba32d7ff340b37eb2f875cb9574553092b79335faadd5f3b6be662b6925L3-R3) * Changed the license from `MIT` to `Apache-2.0` and added a description in `services/frontend/package.json` for clearer project identification and compliance. * Standardized the package name from `extractor_api_lib` to `extractor-api-lib` in `libs/extractor-api-lib/pyproject.toml` for consistency with other packages. **Documentation improvements:** * Added a set of badges to the top of `README.md` to display license, commit activity, issue closure, discussions, PyPI downloads, Kubernetes readiness, and STACKIT readiness, improving project transparency and accessibility.

…pport

…eholder in Confluence extraction

…ence extractor

…enhance test assertion message

…onfiguration

robodev-r2d2 and others added 5 commits October 20, 2025 15:40

chore: add copilot style guide (stackitcloud#146)

3a2a134

Add copilot instructions file for customized and better copilot generation. --------- Co-authored-by: Andreas Klos <[email protected]>

chore: add codex instructions guide (stackitcloud#147)

3c01ab6

Add codex instructions file for customized and better codex generation.

chore: update Confluence parameters documentation

723f821

robodev-r2d2 added the codex label Oct 24, 2025 — with ChatGPT Codex Connector

robodev-r2d2 and others added 9 commits October 27, 2025 12:20

Merge branch 'main' into codex/adjust-confluence-extractor-for-cql-su…

5904ed1

…pport

fix: update validation for max_pages parameter and clarify input plac…

9e1219f

…eholder in Confluence extraction

fix: update chunker type to semantic and set content format in Conflu…

1ae1932

…ence extractor

fix: improve error handling for Confluence extraction parameters and …

651d93b

…enhance test assertion message

refactor: remove confluence_parameters schema from OpenAPI definition

c785610

fix: change chunker type from semantic to recursive in adminBackend c…

8591460

…onfiguration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: align Confluence parameters spec with CQL support #3

chore: align Confluence parameters spec with CQL support #3

Uh oh!

robodev-r2d2 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore: align Confluence parameters spec with CQL support #3

Are you sure you want to change the base?

chore: align Confluence parameters spec with CQL support #3

Uh oh!

Conversation

robodev-r2d2 commented Oct 24, 2025

Summary

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants