Skip to content

Introduce standardized API-first document processing #22

@janhalen

Description

@janhalen

Proposal

migrate from custom ingestion scripts to a standard framework like [haystackfor document processing.
this transition ensures a future-proof, scalable, and maintainable architecture.


Why this matters

The current ingestion approach, implemented in [tika_owui_service_ingestion.py](https://github.com/itk-ai/owui_doc_ingestion/blob/main/scriptscustom solution introduces challenges:

  • Knowledge concentration
    logic is highly specialized and understood by one developer, creating risk when responsibilities shift.

  • Limited interoperability
    custom code makes integration with emerging ai tools and standardized apis time-consuming.

  • Maintenance overhead
    every new feature or bug fix requires manual updates, increasing technical debt over time.


Long-term gains

Resources saved

  • reduced development effort
    no need to maintain bespoke ingestion scripts—haystack provides ready-to-use connectors and pipelines.
  • lower maintenance costs
    security patches and feature updates come upstream, eliminating repetitive manual fixes.
  • time efficiency
    faster onboarding for new developers thanks to a well-documented, widely adopted framework.

Quality improvements

  • consistent architecture
    api-first design ensures predictable behavior and easier integration.
  • enhanced features
    built-in semantic search, question answering, and vector database support improve retrieval accuracy.
  • future-proof design
    scales with project growth without rewriting core ingestion logic.

Supplier experience

  • less frustrating maintenance
    suppliers avoid debugging opaque custom scripts and instead rely on a robust, community-supported framework.
  • clear upgrade path
    standardized components mean updates are straightforward, reducing risk of breaking changes.
  • better interoperability
    easier integration with other ai tools and apis, reducing friction in collaborative environments.

Summary

Migrating to an api-first standard framework like haystack is not just a technical upgrade—it is a strategic investment in scalability, maintainability, and supplier satisfaction.
A migration like the proposed reduces risk, saves resources, and improves quality, while making the work far less frustrating for everyone involved.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions