This application provides functionality to manage and search through transcript data. It supports both building indexes on-the-fly and working with pre-generated flat index files.
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
The application provides a CLI for managing index files:
-
Generate an index file from transcript data:
python -m app.cli generate-index /path/to/data/dir explore-index.json.gz
-
Validate an existing index file:
python -m app.cli validate-index explore-index.json.gz
You can run the web application in two modes:
-
Building index on-the-fly (default):
export FLASK_APP=app flask run -
Using a pre-generated index file:
export FLASK_APP=app export INDEX_FILE=/path/to/explore-index.json.gz flask run
INDEX_FILE: Path to a pre-generated index file (optional)FLASK_APP: Set to "app" to run the Flask applicationFLASK_ENV: Set to "development" for development modeSECRET_KEY: Secret key for Flask sessionsPOSTHOG_API_KEY: PostHog API key for analytics (optional)POSTHOG_HOST: PostHog host URL (optional)DISABLE_ANALYTICS: Set to "true" to disable analytics
app/: Main application packageservices/: Core services including index managementroutes/: Web application routestemplates/: HTML templatesstatic/: Static filescli.py: Command line interface
data/: Data directoryjson/: Transcript JSON filesaudio/: Audio files
app/- Main application coderoutes/- Flask route definitionsservices/- Business logic and data servicesstatic/- CSS, JavaScript, and imagestemplates/- HTML templates
The application expects JSON transcript files with the following structure:
[
{
"start": 0.0,
"text": "Transcript text segment"
},
...
]This project is licensed under the MIT License. The data accessible through this application is licensed under the ivrit.ai license.
For support, help, ideas, and contributions, please contact us at info@ivrit.ai.