codefortulsa · helmturner · Mar 24, 2025
diff --git a/README.md b/README.md
@@ -1,69 +1,65 @@
-# Tulsa Transcribe
+# `tgov-scraper-js`
 
-A system for scraping, processing, and serving Tulsa Government meeting videos and documents.
+Scrape and ingest recordings and documents from meetings of the City of Tulsa's municipal Agencies, Boards, and Commissions (ABCs).
 
 ## Architecture
 
-This application is structured as a set of microservices, each with its own responsibility:
+This application is structured as a set of microservices, each with its own responsibility (For more details, see the [architecture documentation](./docs/architecture.md)):
 
 ### 1. TGov Service
-- Scrapes Tulsa Government meeting information
-- Stores committee and meeting data
-- Extracts video URLs from viewer pages
 
 ### 2. Media Service
-- Downloads and processes videos
-- Extracts audio from videos
-- Manages batch processing of videos
 
 ### 3. Documents Service
-- Handles document storage and retrieval
-- Links documents to meeting records
 
 ### 4. Transcription Service
-- Converts audio files to text using the OpenAI Whisper API
-- Stores and retrieves transcriptions with time-aligned segments
-- Manages transcription jobs
-
-For more details, see the [architecture documentation](./docs/architecture.md).
 
 ## Getting Started
 
-### Prerequisites
-
-- Node.js LTS and npm
-- [Encore CLI](https://encore.dev/docs/install)
-- ffmpeg (for video processing)
-- OpenAI API key (for transcription)
-
 ### Setup
 
 1. Clone the repository:
+
 ```bash
-git clone <repository-url>
-cd tulsa-transcribe
+git clone https://github.com/codefortulsa/tgov-scraper-js.git
+cd tgov-scraper-js
 ```
 
-2. Install dependencies:
+2. Install `node` v22 and `npm` v11 using your favorite version manager. If you don't have one, we recommend [nvm](https://github.com/nvm-sh/nvm#installing-and-updating):
+
 ```bash
-npm install
+nvm install 22
+nvm use 22
+nvm install-latest-npm
 ```
 
-3. Run the setup script to configure your environment:
+3. [Install Docker Desktop](https://docs.docker.com/get-docker/)
+
+4. [Install `ffmpeg`](https://ffmpeg.org/download.html)
+
+5. [Install the Encore CLI](https://encore.dev/docs/ts/install#install-the-encore-cli)
+
+6. Install NPM dependencies:
+
 ```bash
-npx ts-node setup.ts
+npm install
 ```
 
-4. Update the `.env` file with your database credentials and API keys:
+7. Copy the example [local secret overrides file](https://encore.dev/docs/ts/primitives/secrets#overriding-local-secrets):
+
+```bash
+cp .secrets.local.cue.EXAMPLE .secrets.local.cue
 ```
-TGOV_DATABASE_URL="postgresql://username:password@localhost:5432/tgov?sslmode=disable"
-MEDIA_DATABASE_URL="postgresql://username:password@localhost:5432/media?sslmode=disable"
-DOCUMENTS_DATABASE_URL="postgresql://username:password@localhost:5432/documents?sslmode=disable"
-TRANSCRIPTION_DATABASE_URL="postgresql://username:password@localhost:5432/transcription?sslmode=disable"
-OPENAI_API_KEY="your-openai-api-key"
+
+. Set your local secrets:
+
+```sh
+# path: ./.secrets.local.cue
+OPENAI_API_KEY: "<your-openai-api-key>"
 ```
 
-5. Run the application using Encore CLI:
+9. Run the application using Encore CLI:
+
 ```bash
 encore run
 ```
@@ -72,43 +68,43 @@ encore run
 
 ### TGov Service
 
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/scrape/tgov` | GET | Trigger a scrape of the TGov website |
-| `/tgov/meetings` | GET | List meetings with filtering options |
-| `/tgov/committees` | GET | List all committees |
-| `/tgov/extract-video-url` | POST | Extract a video URL from a viewer page |
+| Endpoint                  | Method | Description                            |
+| ------------------------- | ------ | -------------------------------------- |
+| `/scrape/tgov`            | GET    | Trigger a scrape of the TGov website   |
+| `/tgov/meetings`          | GET    | List meetings with filtering options   |
+| `/tgov/committees`        | GET    | List all committees                    |
+| `/tgov/extract-video-url` | POST   | Extract a video URL from a viewer page |
 
 ### Media Service
 
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/api/videos/download` | POST | Download videos from URLs |
-| `/api/media/:blobId/info` | GET | Get information about a media file |
-| `/api/videos` | GET | List all stored videos |
-| `/api/audio` | GET | List all stored audio files |
-| `/api/videos/batch/queue` | POST | Queue a batch of videos for processing |
-| `/api/videos/batch/:batchId` | GET | Get the status of a batch |
-| `/api/videos/batch/process` | POST | Process the next batch of videos |
+| Endpoint                     | Method | Description                            |
+| ---------------------------- | ------ | -------------------------------------- |
+| `/api/videos/download`       | POST   | Download videos from URLs              |
+| `/api/media/:blobId/info`    | GET    | Get information about a media file     |
+| `/api/videos`                | GET    | List all stored videos                 |
+| `/api/audio`                 | GET    | List all stored audio files            |
+| `/api/videos/batch/queue`    | POST   | Queue a batch of videos for processing |
+| `/api/videos/batch/:batchId` | GET    | Get the status of a batch              |
+| `/api/videos/batch/process`  | POST   | Process the next batch of videos       |
 
 ### Documents Service
 
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/api/documents/download` | POST | Download and store a document |
-| `/api/documents` | GET | List documents with filtering options |
-| `/api/documents/:id` | GET | Get a specific document |
-| `/api/documents/:id` | PATCH | Update document metadata |
-| `/api/meeting-documents` | POST | Download and link meeting agenda documents |
+| Endpoint                  | Method | Description                                |
+| ------------------------- | ------ | ------------------------------------------ |
+| `/api/documents/download` | POST   | Download and store a document              |
+| `/api/documents`          | GET    | List documents with filtering options      |
+| `/api/documents/:id`      | GET    | Get a specific document                    |
+| `/api/documents/:id`      | PATCH  | Update document metadata                   |
+| `/api/meeting-documents`  | POST   | Download and link meeting agenda documents |
 
 ### Transcription Service
 
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/transcribe` | POST | Request transcription for an audio file |
-| `/jobs/:jobId` | GET | Get the status of a transcription job |
-| `/transcriptions/:transcriptionId` | GET | Get a transcription by ID |
-| `/meetings/:meetingId/transcriptions` | GET | Get all transcriptions for a meeting |
+| Endpoint                              | Method | Description                             |
+| ------------------------------------- | ------ | --------------------------------------- |
+| `/transcribe`                         | POST   | Request transcription for an audio file |
+| `/jobs/:jobId`                        | GET    | Get the status of a transcription job   |
+| `/transcriptions/:transcriptionId`    | GET    | Get a transcription by ID               |
+| `/meetings/:meetingId/transcriptions` | GET    | Get all transcriptions for a meeting    |
 
 ## Cron Jobs