diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 3bd2a31..a0ffbfd 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -12,14 +12,21 @@ This document provides guidelines for contributing to this project.
git clone https://github.com/YOUR_USERNAME/moorcheh-python-sdk.git
cd moorcheh-python-sdk
```
-3. Set Up Environment: Follow the "Development Setup" instructions in the README.md file, which uses Poetry to create a virtual environment and install dependencies (including development tools).
+3. Set Up Environment: Install dependencies using uv (this includes development tools like pytest).
```bash
-poetry install --with dev
+uv sync
```
+
4. Create a Branch: Create a new branch for your changes. Use a descriptive name (e.g., fix/issue-123, feat/add-graph-endpoint).
```bash
git checkout -b your-branch-name
```
+
+5. (Optional) Set your MOORCHEH_API_KEY environment variable. Run examples using uv run:
+```bash
+uv run python examples/quickstart.py
+```
+
# Making Changes
* **Code Style:** Please follow standard Python coding conventions (PEP 8). We recommend using tools like black for formatting and ruff or flake8 for linting (consider adding these to dev-dependencies in pyproject.toml if you haven't already).
* **Testing:**
@@ -28,7 +35,7 @@ git checkout -b your-branch-name
2. Integration Tests: Add or update tests for any bugs you fix to prevent regressions.
3. Ensure all tests pass before submitting a pull request. Run tests using:
```bash
-poetry run pytest tests/
+uv run pytest tests/
```
* **Documentation:** Update docstrings, examples, and the README.md as necessary to reflect your changes.
* **Commit Messages:** Write clear and concise commit messages explaining the "what" and "why" of your changes.
diff --git a/README.md b/README.md
index 6efea7d..6a9958b 100644
--- a/README.md
+++ b/README.md
@@ -1,206 +1,149 @@
-# Moorcheh Python SDK
-
-
-
-
+
+
+
+ Learn more + · + Tutorials + · + Join Discord +
+ + + +## Why Moorcheh? +- **32x Compression Ratio** over traditional Vector DBs +- **85% Reduced End-to-End Latency** over Pinecone vector search + Cohere reranker +- **0$ Storage Cost** true serverless architecture scaling to 0 when idle +- [Read the full paper](https://www.arxiv.org/abs/2601.11557) + +[Moorcheh](https://moorcheh.ai/) is the universal memory layer for agentic AI, providing fast deterministic semantic search with zero‑ops scalability. Its MIB + ITS stack preserves relevance while reducing storage cost and decreasing latency, providing high‑accuracy semantic search without the overhead of managing clusters, making it ideal for production‑grade RAG, agentic memory, and semantic analytics. + +## 🛠️ Key Capabilities + +* **Bring any data:** Ingest raw text, files, or vectors with a unified API. +* **One-shot RAG:** Go from ingestion to grounded answers in a single flow. +* **Zero-ops scale:** Serverless architecture that scales up and down automatically. +* **Infrastructure as code:** Deploy into your cloud with native [IaC templates](https://moorcheh.ai/plans). +* **Agentic memory:** Stateful context for assistants and long-running agents. +* **Developer-ready:** Async support, type hints, and clear error handling. + +## 🚀 Quickstart Guide + +### Hosted Platform +Use our [hosted platform](https://console.moorcheh.ai) to get up and running fast with managed indexing, zero-ops scaling, and usage-based billing. + +### Self-Hosted + +1. Install the SDK using pip: ```bash pip install moorcheh-sdk ``` -## Development - -If you want to contribute or run the examples locally, clone the repository and install using uv: -```bash -git clone https://github.com/moorcheh-ai/moorcheh-python-sdk.git - -cd moorcheh-python-sdk +2. Sign up and generate an API key through the [Moorcheh](https://moorcheh.ai) platform dashboard. -uv sync -``` - -## Authentication -The SDK requires a Moorcheh API key for authentication. Obtain an API Key: Sign up and generate an API key through the [Moorcheh.ai](https://moorcheh.ai) platform dashboard. - -The recommended way is to set the MOORCHEH_API_KEY environment variable: +3. The recommended way is to set the MOORCHEH_API_KEY environment variable: ```bash export MOORCHEH_API_KEY="YOUR_API_KEY_HERE" ``` -## Quick Start -This example demonstrates the basic usage after installing the SDK. +## Basic Usage ```python import os -from moorcheh_sdk import MoorchehClient, MoorchehError, ConflictError +from moorcheh_sdk import MoorchehClient api_key = os.environ.get("MOORCHEH_API_KEY") -try: - with MoorchehClient(api_key=api_key) as client: - # 1. Create a namespace - namespace_name = "my-first-namespace" - print(f"Attempting to create namespace: {namespace_name}") - try: - client.namespaces.create(namespace_name=namespace_name, type="text") - print(f"Namespace '{namespace_name}' created.") - except ConflictError: - print(f"Namespace '{namespace_name}' already exists.") - except MoorchehError as e: - print(f"Error creating namespace: {e}") - exit() - - # 2. List namespaces - print("\nListing namespaces...") - ns_list = client.namespaces.list() - print("Available namespaces:") - for ns in ns_list.get('namespaces', []): - print(f" - {ns.get('namespace_name')} (Type: {ns.get('type')})") - - # 3. Upload a document - print(f"\nUploading document to '{namespace_name}'...") - docs = [{"id": "doc1", "text": "This is the first document about Moorcheh."}] - upload_res = client.documents.upload(namespace_name=namespace_name, documents=docs) - print(f"Upload status: {upload_res.get('status')}") - - # Add a small delay for processing before searching - import time - print("Waiting briefly for processing...") - time.sleep(2) - - # 4. Search the namespace - print(f"\nSearching '{namespace_name}' for 'Moorcheh'...") - search_res = client.similarity_search.query(namespaces=[namespace_name], query="Moorcheh", top_k=1) - print("Search results:") - print(search_res) - - # 5. Get a Generative AI Answer - print(f"\nGetting a GenAI answer from '{namespace_name}'...") - gen_ai_res = client.answer.generate(namespace=namespace_name, query="What is Moorcheh?") - print("Generative Answer:") - print(gen_ai_res) - - # 6. Delete the document - print(f"\nDeleting document 'doc1' from '{namespace_name}'...") - delete_res = client.documents.delete(namespace_name=namespace_name, ids=["doc1"]) - print(f"Delete status: {delete_res.get('status')}") - - # 7. Delete the namespace (optional cleanup) - # print(f"\nDeleting namespace '{namespace_name}'...") - # client.namespaces.delete(namespace_name) - # print("Namespace deleted.") - -except MoorchehError as e: - print(f"\nAn SDK error occurred: {e}") -except Exception as e: - print(f"\nAn unexpected error occurred: {e}") -``` -(Note: For more detailed examples covering vector operations, error handling, and logging configuration, please see the examples/ directory in the source repository.) +with MoorchehClient(api_key=api_key) as client: + # Create a namespace + namespace_name = "my-first-namespace" + client.namespaces.create(namespace_name=namespace_name, type="text") -## Development Setup -If you want to contribute, run tests, or run the example scripts directly from the source code: + # Upload a document + docs = [{"id": "doc1", "text": "This is the first document about Moorcheh."}] + upload_res = client.documents.upload(namespace_name=namespace_name, documents=docs) + print(f"Upload status: {upload_res.get('status')}") -Clone the repository: -```bash -git clone https://github.com/moorcheh-ai/moorcheh-python-sdk.git -cd moorcheh-python-sdk -``` + # Add a small delay for processing before searching + import time + print("Waiting briefly for processing...") + time.sleep(2) -Install dependencies using uv (this includes development tools like pytest): -```bash -uv sync -``` + # Perform semantic search on the namespace + search_res = client.similarity_search.query(namespaces=[namespace_name], query="Moorcheh", top_k=1) + print("Search results:") + print(search_res) -Set your MOORCHEH_API_KEY environment variable. -Run examples using uv run: -```bash -uv run python examples/quickstart.py + # Get a Generative AI Answer + gen_ai_res = client.answer.generate(namespace=namespace_name, query="What is Moorcheh?") + print("Generative Answer:") + print(gen_ai_res) ``` -Run tests using uv run: -```bash -uv run pytest tests/ -``` +For more detailed examples covering vector operations, error handling, and logging configuration, please see the [examples directory](https://github.com/moorcheh-ai/moorcheh-python-sdk/tree/main/examples). ## API Client Methods -The `MoorchehClient` class provides the following methods corresponding to the API v1 endpoints: -### Namespace Management: -```python -namespaces.create(namespace_name, type, vector_dimension=None) -``` -```python -namespaces.list() -``` -```python -namespaces.delete(namespace_name) -``` -### Data Ingestion: -```python -documents.upload(namespace_name, documents) - For text namespaces (async processing). -``` -```python -vectors.upload(namespace_name, vectors) - For vector namespaces (sync processing). -``` -### Semantic Search -```python -similarity_search.query(namespaces, query, top_k=10, threshold=None, kiosk_mode=False) - Handles text or vector queries. -``` -### Generative AI Response -```python -answer.generate(namespace, query, top_k=5, ...) -- Gets a context-aware answer from an LLM. -``` - -### Data Deletion: -```python -documents.delete(namespace_name, ids) -``` -```python -vectors.delete(namespace_name, ids) -``` -### Analysis (Planned): -```python -get_eigenvectors(namespace_name, n_eigenvectors=1) - Not yet implemented -``` -```python -get_graph(namespace_name) - Not yet implemented -``` -```python -get_umap_image(namespace_name, n_dimensions=2) - Not yet implemented -``` -(Refer to method docstrings or full documentation for detailed parameters and return types.) - -## Documentation -Full API reference and further examples can be found at: [https://docs.moorcheh.ai/](https://docs.moorcheh.ai/) +The `MoorchehClient` and `AsyncMoorchehClient` classes provide the same method signatures. Below is a list of the available methods. + +| Methods | Required Parameters | Description | +| ------------------------- | -------------------------------------- | -------------------------------------------------- | +| `namespaces.create` | namespace_name, type, vector_dimension | Create a text or vector namespace. | +| `namespaces.list` | N/A | List all available namespaces. | +| `namespaces.delete` | namespace_name | Delete a namespace by name. | +| `documents.upload` | namespace_name, documents | Upload text documents to a text namespace. | +| `documents.get` | namespace_name, ids | Retrieve documents by ID. | +| `documents.upload_file` | namespace_name, file_path | Upload a file for server-side ingestion. | +| `documents.delete` | namespace_name, ids | Delete documents by ID. | +| `documents.delete_files` | namespace_name, file_names | Delete uploaded files by filename. | +| `vectors.upload` | namespace_name, vectors=[{id, vector}] | Upload vectors to a vector namespace. | +| `vectors.delete` | namespace_name, ids | Delete vectors by ID. | +| `similarity_search.query` | namespaces, query | Run semantic search with text or vector queries. | +| `answer.generate` | namespaces, query | Generate a grounded answer from a namespace. | + +For fully detailed method functionality, please see the [API Reference](https://docs.moorcheh.ai/api-reference/introduction). + +## 🔗 Integrations +- **[LlamaIndex](https://developers.llamaindex.ai/python/framework/integrations/vector_stores/moorchehdemo)**: Use Moorcheh as a vector store inside LlamaIndex pipelines. +- **[LangChain](https://docs.langchain.com/oss/python/integrations/vectorstores/moorcheh)**: Plug Moorcheh into LangChain retrievers and RAG chains. +- **[n8n](https://n8n.io/integrations/moorcheh)**: Automate workflows that ingest, search, or answer with Moorcheh. +- **[MCP](https://github.com/moorcheh-ai/moorcheh-mcp)**: Connect Moorcheh to external tools via Model Context Protocol. + + +## Roadmap (Planned) + +| Item | Required Parameters | Description | +| ------------------ | -------------------------------- | ----------------------------------------------------------------- | +| `get_eigenvectors` | namespace_name, n_eigenvectors | Expose top eigenvectors for semantic structure analysis. | +| `get_graph` | namespace_name | Provide a graph view of relationships across data in a namespace. | +| `get_umap_image` | namespace_name, n_dimensions | Generate a 2D UMAP projection image for quick visual exploration. | + +## Documentation & Support +Have questions or feedback? We're here to help: +- Docs: [https://docs.moorcheh.ai](https://docs.moorcheh.ai) +- Discord: [Join our Discord server](https://lnkd.in/gE_Pz_kb) +- Appointment: [Book a Discovery Call](https://www.edgeaiinnovations.com/appointments) +- Email: support@moorcheh.ai ## Contributing -Contributions are welcome! Please refer to the contributing guidelines (CONTRIBUTING.md) for details on setting up the development environment, running tests, and submitting pull requests. +Contributions are welcome! Please refer to the contributing guidelines ([CONTRIBUTING.md](CONTRIBUTING.md)) for details on setting up the development environment, running tests, and submitting pull requests. ## License This project is licensed under the MIT License - See the LICENSE file for details. diff --git a/llms.txt b/llms.txt new file mode 100644 index 0000000..8eaf198 --- /dev/null +++ b/llms.txt @@ -0,0 +1,719 @@ +# llms.txt + +Moorcheh Python SDK. Use this file as guidance for LLMs and tooling when reading or citing docs and code. + +## Sources and freshness +- API behavior: https://docs.moorcheh.ai and https://docs.moorcheh.ai/api-reference/introduction +- SDK behavior: this repo source (including retry/backoff and logging) +- Pricing/limits: https://moorcheh.ai/plans +- If any of these differ, prefer the canonical docs and plan page above. + +## Canonical docs +- https://docs.moorcheh.ai/ +- https://docs.moorcheh.ai/quickstart +- https://docs.moorcheh.ai/python-sdk/introduction +- https://docs.moorcheh.ai/api-reference/introduction + +## Primary product/site links +- https://moorcheh.ai/ +- https://console.moorcheh.ai/ +- https://www.youtube.com/@moorchehai/videos +- https://x.com/moorcheh_ai + +## Repository scope +- Repo: https://github.com/moorcheh-ai/moorcheh-python-sdk +- SDK package: https://pypi.org/project/moorcheh-sdk/ + +## API base and auth +- Base URL: https://api.moorcheh.ai/v1 +- Auth header: x-api-key: YOUR_API_KEY +- Content-Type: application/json +- Error format: {"error": "...", "message": "..."} +- Example error: + { + "error": "Error description", + "message": "Detailed error message" + } +- Common status codes: 200, 201, 202, 207, 400, 401, 403, 404, 409, 413, 429, 500 +- Rate limits depend on subscription tier; contact support for details. +- Plans page lists API Rate Limit: 100/hour (see https://moorcheh.ai/plans for tier context). + +## Pricing and usage units (from plans page) +- Compute unit: $0.01 per unit. +- Semantic search: 5 units per 1,000 queries. +- Ingestion: 1 unit per 10MB (one-time). +- Management: 5 units per 1,000 listing/updating calls. +- Gen AI calls (Bedrock models): Standard 1 unit, Advanced 2 units, Premium 3 units. +- Namespaces and vector storage: listed as $0.00/month on plans page. + +## Plan tiers (summary from plans page) +- Builder: $0/month, shared tenancy, community support, 10k vectors. +- Production: $29/month + usage, 99.9% uptime SLA, priority support, team roles. +- Sovereign: custom pricing, native VPC deployment, dedicated instances, audit logs. +- Verify current limits and features on https://moorcheh.ai/plans. + +## Quickstart (SDK) +- Install: pip install moorcheh-sdk +- Auth: set MOORCHEH_API_KEY environment variable +- Client: use context manager (with MoorchehClient() as client) + +## Core concepts +- Namespaces isolate data; type is text or vector. +- Text namespaces use automatic embeddings; vector namespaces require vector_dimension. +- Search uses ITS scoring with human-readable labels and numeric scores (0 to 1). +- Text uploads are async; vector uploads are sync and immediately searchable. + +## REST API endpoints (summary) + +### Namespaces +- POST /namespaces (create) + - Body: + - namespace_name (string, required): unique name (alphanumeric, hyphens, underscores only). + - type (string, required): "text" or "vector". + - vector_dimension (number, required for vector): embedding dimension (e.g., 1536). + - Notes: + - Names must be unique in your account. + - Type cannot be changed after creation. + - Vector dimension cannot be modified after creation. + - Creation counts toward tier limits. + - Use cases: create isolated datasets for different products, teams, or embedding types. + - Common responses: 201, 400, 401, 403 (limit/activation), 409, 500. + - Example response: + { + "status": "success", + "message": "Namespace 'my-documents' created successfully. ✅", + "namespace_name": "my-documents" + } + - Example request (curl): + curl -X POST "https://api.moorcheh.ai/v1/namespaces" \ + -H "Content-Type: application/json" \ + -H "x-api-key: your-api-key-here" \ + -d '{"namespace_name":"my-documents","type":"text"}' + \- Example request (SDK): + with MoorchehClient() as client: + client.documents.get(namespace_name="demo_docs", ids=["doc1"]) + + #### documents/get failure modes and error taxonomy + + | Status | Meaning | Retry? | Client Action | Fields Present | + |--------|---------|--------|--------------|----------------| + | 200 | All requested documents found and returned | No | Use result | items, request_id | + | 207 | Partial: some docs missing/unavailable | If retryable | Use found docs, check not_found_ids, retry if retryable | items, not_found_ids, error_code, retryable, request_id | + | 404 | Namespace or all docs not found | No | Surface error | error, error_code, request_id | + | 429 | Rate limited | Yes | Retry with backoff | error, error_code, retryable, request_id, Retry-After header | + | 5xx | Backend/internal error | Yes | Retry with backoff | error, error_code, retryable, request_id, Retry-After header | + + - Error response fields: + - error_code (string): stable error code (e.g., RATE_LIMITED, NAMESPACE_NOT_READY, UPLOAD_IN_PROGRESS, INTERNAL_DEPENDENCY_FAILURE, NOT_FOUND, PARTIAL_SUCCESS). + - retryable (boolean): true if client should retry (e.g., for 429, 5xx, or transient 207). + - request_id (string): unique request/correlation ID (may also be in x-request-id header). + - not_found_ids (array): list of missing/unavailable IDs. + - error (string): human-readable error message. + + - Consistency/availability: + - documents/get reads from the source-of-truth document store (should be available immediately after upload/validation), not the search index. + - If get returns 207, not_found_ids may include IDs that are temporarily unavailable (e.g., ingestion in progress, backend degraded) or genuinely missing; error_code and retryable clarify which. + + - Partial success contract (207): + - not_found_ids: always present for missing/unavailable IDs. + - error_code: present if any unavailable IDs are due to transient/backend issues (e.g., UPLOAD_IN_PROGRESS). + - retryable: true if retrying may succeed (e.g., backend recovery, ingestion completion). + + - Retry/backoff: + - documents/get is safe to retry on 429, 5xx, and retryable 207 errors. + - SDK defaults: max_retries=3, exponential backoff with jitter, honors Retry-After header. + + - Example error response (partial, retryable): + { + "items": [ ... ], + "not_found_ids": ["doc-2"], + "error_code": "UPLOAD_IN_PROGRESS", + "retryable": true, + "request_id": "req-abc123" + } + + - Example error response (permanent not found): + { + "items": [ ... ], + "not_found_ids": ["doc-2"], + "error_code": "NOT_FOUND", + "retryable": false, + "request_id": "req-def456" + } + + - Example error response (rate limited): + { + "error": "Rate limit exceeded", + "error_code": "RATE_LIMITED", + "retryable": true, + "request_id": "req-ghi789" + } + + - Example error response (internal error): + { + "error": "Internal dependency failure", + "error_code": "INTERNAL_DEPENDENCY_FAILURE", + "retryable": true, + "request_id": "req-jkl012" + } +- GET /namespaces (list) + - Response: + - namespaces: array of { namespace_name, type, vector_dimension, itemCount, createdAt }. + - Use cases: dashboards, usage analytics, dropdown population, storage monitoring. + - Notes: no pagination or continuation tokens documented; returns all namespaces. + - Common responses: 200, 401, 403, 429, 500. + - Example response: + { + "namespaces": [ + { + "namespace_name": "my_documents", + "type": "text", + "vector_dimension": null, + "itemCount": 1247, + "createdAt": "2024-01-15T10:30:00.000Z" + } + ] + } + - Example request (curl): + curl -X GET "https://api.moorcheh.ai/v1/namespaces" \ + -H "x-api-key: your-api-key-here" + - Example request (SDK): + with MoorchehClient() as client: + client.namespaces.list() +- DELETE /namespaces/{namespace_name} (delete) + - Response (202): + - status: "pending". + - message: confirmation. + - Notes: async deletion; irreversible. + - Deletion may take minutes for large namespaces. + - Names become reusable after deletion completes. + - Best practices: verify name, back up important data, confirm via list after deletion. + - Common responses: 202, 401, 403, 404, 429, 500. + - Example response: + { + "status": "pending", + "message": "Request accepted. Namespace 'my-documents' has been queued for deletion." + } + - Example request (curl): + curl -X DELETE "https://api.moorcheh.ai/v1/namespaces/my-documents" \ + -H "x-api-key: your-api-key-here" + - Example request (SDK): + with MoorchehClient() as client: + client.namespaces.delete(namespace_name="my-documents") + +### Data operations: text documents +- POST /namespaces/{namespace_name}/documents (upload text) + - Body: documents (array, required). Each document is a flat object. + - id (string or number, required): unique document ID within namespace. + - text (string, required): main document text. + - Any other fields are treated as metadata (key-value pairs). + - Text limits: min 10 chars, max 50,000 chars per doc. + - Batch: max 100 docs per request; recommended 25-50. + - Metadata limits: max 2KB per doc, up to 50 keys. + - Response (202 or 207): + - status: "success" or "partial". + - message: human-readable confirmation. + - upload_id: upload batch ID for tracking. + - namespace_name: target namespace. + - documents_processed: count of processed docs. + - processing_status: in_progress, completed, or failed. + - estimated_completion: ISO 8601 timestamp. + - uploaded_documents: array of { id, status, character_count }. + - Processing pipeline: validation, text processing, embedding generation, index updates, metadata enrichment. + - Notes: + - Text uploads are async; wait briefly before search. + - IDs must be unique; reusing an ID overwrites prior entry. + - Document id must be a non-empty string or number. + - Best practices: + - Chunk long documents into coherent sections. + - Use consistent metadata schema across documents. + - Upload in batches (25-50) for best throughput. + - Use cases: knowledge bases, support docs, research corpora, training materials. + - Common responses: 202, 207, 400, 401, 403, 404, 500. + - Example response: + { + "status": "success", + "message": "2 documents uploaded successfully to namespace 'technical-docs'", + "upload_id": "upload_1234567890", + "namespace_name": "technical-docs", + "documents_processed": 2, + "processing_status": "in_progress", + "estimated_completion": "2024-01-15T10:35:00Z", + "uploaded_documents": [ + { "id": "doc_001", "status": "processing", "character_count": 89 }, + { "id": "doc_002", "status": "processing", "character_count": 112 } + ] + } + - Example request (curl): + curl -X POST "https://api.moorcheh.ai/v1/namespaces/demo-namespace/documents" \ + -H "Content-Type: application/json" \ + -H "x-api-key: your-api-key-here" \ + -d '{"documents":[{"id":"doc_001","text":"Machine learning...","title":"Intro","category":"education"}]}' + - Example request (SDK): + with MoorchehClient() as client: + client.documents.upload( + namespace_name="demo-namespace", + documents=[{"id":"doc_001","text":"Machine learning...","title":"Intro","category":"education"}] + ) +- POST /namespaces/{namespace_name}/documents/get (get documents) + - Body: ids (array, required, max 100 IDs per request). + - Response (200 or 207): + - status: "success" or "partial". + - message: confirmation. + - requested_ids: count requested. + - found_items: count returned. + - items: array of documents { id, text, metadata }. + - not_found_ids: array of missing IDs (only when partial). + - Notes: + - Non-existent IDs are ignored and reported in not_found_ids. + - Use for retrieval by ID; use search endpoint for semantic retrieval. + - Best practices: batch retrieval (up to 100) and cache frequently accessed documents. + - Common responses: 200, 207, 400, 401, 403, 404, 429, 500. + - Example response: + { + "status": "success", + "message": "Successfully retrieved 1 items from namespace 'demo_docs'.", + "requested_ids": 1, + "found_items": 1, + "items": [ + { "id": "doc1", "metadata": {}, "text": "This is the first document about Moorcheh." } + ] + } + - Example request (curl): + curl -X POST "https://api.moorcheh.ai/v1/namespaces/demo_docs/documents/get" \ + -H "x-api-key: your-api-key-here" \ + -H "Content-Type: application/json" \ + -d '{"ids":["doc1"]}' + - Example request (SDK): + with MoorchehClient() as client: + client.documents.get(namespace_name="demo_docs", ids=["doc1"]) +- POST /namespaces/{namespace_name}/documents/delete (delete documents) + - Body: ids (array, required, max 1000 IDs per request). + - Response (200 or 207): + - status: "success" or "partial". + - message: confirmation. + - requested_deletions: count requested. + - actual_deletions: count deleted. + - remaining_items: namespace item count after deletion. + - requested_ids: list of requested IDs. + - unprocessed_ids: list of IDs not deleted (only when partial). + - Notes: + - Deletes are permanent and decrement total item count. + - Use documents/delete for text and vectors/delete for vector data. + - Best practices: verify IDs, delete in batches, log partial failures. + - Common responses: 200, 207, 400, 401, 403, 404, 429, 500. + - Example response: + { + "status": "success", + "message": "Successfully processed deletion request. Requested: 1, Actually deleted: 0 items from namespace 'test'.", + "requested_deletions": 1, + "actual_deletions": 0, + "remaining_items": 1, + "requested_ids": ["doc-001"] + } + - Example request (curl): + curl -X POST "https://api.moorcheh.ai/v1/namespaces/my-namespace/documents/delete" \ + -H "Content-Type: application/json" \ + -H "x-api-key: your-api-key-here" \ + -d '{"ids":["document-001","document-002"]}' + - Example request (SDK): + with MoorchehClient() as client: + client.documents.delete(namespace_name="my-namespace", ids=["document-001","document-002"]) + +### Data operations: vectors +- POST /namespaces/{namespace_name}/vectors (upload vectors) + - Body: vectors (array, required). Each vector is a flat object. + - id (string, required): unique vector ID. + - vector (array of numbers, required): embedding values. + - text (string, optional): original text for display. + - Any other fields are treated as metadata. + - Dimension must match namespace dimension exactly. + - Batch: max 1000 vectors; recommended 100-500. + - Value range: normalized vectors preferred (typical -1.0 to 1.0), float32. + - Response (201 or 207): + - status: "success" or "partial". + - message: confirmation. + - upload_id: upload batch ID. + - namespace_name: target namespace. + - vectors_processed: count processed. + - processing_status: completed (vectors are sync). + - uploaded_vectors: array of { id, status, dimension, created_at }. + - Processing pipeline: dimension validation, format validation, index insertion, metadata storage, immediate availability. + - Notes: + - Vector uploads are sync; immediately searchable. + - All vectors in a batch must share the same dimension. + - Vector id must be a non-empty string. + - Best practices: + - Normalize vectors, use consistent preprocessing. + - Include original text when possible for display. + - Use cases: similarity search, recommendations, deduplication, clustering. + - Common responses: 201, 207, 400, 401, 403, 404, 500. + - Example response: + { + "status": "success", + "message": "2 vectors uploaded successfully to namespace 'product-embeddings'", + "upload_id": "upload_vec_1234567890", + "namespace_name": "product-embeddings", + "vectors_processed": 2, + "processing_status": "completed", + "uploaded_vectors": [ + { "id": "prod_001", "status": "completed", "dimension": 4, "created_at": "2024-01-15T10:30:00Z" }, + { "id": "prod_002", "status": "completed", "dimension": 4, "created_at": "2024-01-15T10:30:00Z" } + ] + } + - Example request (curl): + curl -X POST "https://api.moorcheh.ai/v1/namespaces/my-vectors/vectors" \ + -H "Content-Type: application/json" \ + -H "x-api-key: your-api-key-here" \ + -d '{"vectors":[{"id":"vec_001","vector":[0.1,-0.2,0.3],"text":"Machine learning"}]}' + - Example request (SDK): + with MoorchehClient() as client: + client.vectors.upload( + namespace_name="my-vectors", + vectors=[{"id":"vec_001","vector":[0.1,-0.2,0.3],"text":"Machine learning"}] + ) +- POST /namespaces/{namespace_name}/vectors/delete (delete vectors) + - Body: ids (array, required, max 1000 IDs per request). + - Response: same fields as documents delete. + - Notes: IDs can be strings or numbers; converted to strings internally. + - Common responses: 200, 207, 400, 401, 403, 404, 429, 500. + - Example request (curl): + curl -X POST "https://api.moorcheh.ai/v1/namespaces/my-vectors/vectors/delete" \ + -H "Content-Type: application/json" \ + -H "x-api-key: your-api-key-here" \ + -d '{"ids":["vec_001","vec_002"]}' + - Example request (SDK): + with MoorchehClient() as client: + client.vectors.delete(namespace_name="my-vectors", ids=["vec_001","vec_002"]) + +### File uploads +- POST /namespaces/{namespace_name}/upload-url (get pre-signed S3 URL) + - Use this for file uploads; supports up to 5GB. + - Body: + - fileName (string, required): target filename including extension. + - Response: + - uploadUrl: pre-signed S3 URL for PUT upload. + - key: S3 object key. + - contentType: Content-Type to use in PUT. + - expiresIn: URL validity in seconds (default 900). + - method: HTTP method for upload (PUT). + - hint: human-readable upload instructions. + - URL expires in 15 minutes; upload via PUT to uploadUrl with Content-Type. + - Supported types include: .pdf, .docx, .xlsx, .json, .txt, .csv, .md + - Notes: pre-signed URL upload is the supported file upload path; supports up to 5GB. + - Upload URL expires in 15 minutes; request a new URL if expired. + - After S3 upload, files are processed through the document pipeline. + - Use search or get documents after a short delay to confirm ingestion (this is indexing latency, not the S3 upload time). + - Common responses: 200, 400 (missing fileName/unsupported type), 401, 404. + - Example response: + { + "uploadUrl": "https://s3.us-east-1.amazonaws.com/...", + "key": "ownerId/namespace/document.pdf", + "contentType": "application/pdf", + "expiresIn": 900, + "method": "PUT", + "hint": "Upload the file with: PUT