Semantic, lexical, and multilingual search for your OGD metadata catalog.
# Clone the repository
git clone https://github.com/statistikZH/ogd_ai-search.git
cd ogd_ai-search
# Install dependencies
pip3 install uv
uv venv
source .venv/bin/activate
uv sync
# Create search index
# Run 01_mdv_search.ipynb to create the Weaviate search index
# Start the app
cd _streamlit
streamlit run ai-search.pySearch the Canton of Zurich's open government data catalog using hybrid search that combines lexical keyword matching with semantic similarity. The application supports multiple languages, including German and all European languages.
The search uses intfloat/multilingual-e5-small for embeddings via sentence-transformers—a multilingual model optimized for German with a 512-token context length. Search results are powered by Weaviate, an open-source vector database.
Semantic search finds text based on meaning rather than exact keywords. For example, searching for disease can return documents containing illness, virus, infection, treatment, or healthcare without the exact word disease appearing.
Using statistical methods and Machine Learning, language models learn word and sentence similarities from large text corpora. While semantic search has many advantages, it is approximate rather than exact and may include false positives or miss relevant entries.
Hybrid search combines lexical and semantic approaches, delivering both exact keyword matches and semantically similar results.
Laure Stadler, Chantal Amrhein, Patrick Arnecke – Statistisches Amt Zürich: Team Data
Many thanks to Corinna Grobe and our former colleague Adrian Rupp.
We'd love to hear from you. Share your feedback or ideas by emailing us, opening an issue, or submitting a pull request.
We use Ruff for linting and code formatting with default settings.
This software (the Software) incorporates models (Models) from Hugging Face and others and has been developed according to and with the intent to be used under Swiss law. Please be aware that the EU Artificial Intelligence Act (EU AI Act) may, under certain circumstances, be applicable to your use of the Software. You are solely responsible for ensuring that your use of the Software as well as of the underlying Models complies with all applicable local, national and international laws and regulations. By using this Software, you acknowledge and agree (a) that it is your responsibility to assess which laws and regulations, in particular regarding the use of AI technologies, are applicable to your intended use and to comply therewith, and (b) that you will hold us harmless from any action, claims, liability or loss in respect of your use of the Software.
