cyserman · google-labs-jules · Nov 22, 2025
diff --git a/.copilot-tracking/README.md b/.copilot-tracking/README.md
@@ -0,0 +1,25 @@
+# Copilot Tracking Workflow
+
+This directory structures the workflow for AI agents (Jules) to research, plan, detail, and implement changes in the ProSe repository.
+
+## Structure
+
+- **research/**: Exploratory analysis, code audits, and problem definition.
+- **plans/**: High-level step-by-step plans for approved tasks.
+- **details/**: Specific technical specs, schemas, and implementation details.
+- **prompts/**: Refined prompts for the Implementation Agent.
+- **changes/**: Logs of actual file modifications (change logs).
+- **templates/**: Standard formats for the above artifacts.
+
+## Workflow
+
+1. **Research**: Agent analyzes codebase and requirements → `research/{date}-{task}.md`
+2. **Plan**: Agent proposes a plan → `plans/{task}-plan.md`
+3. **Details**: Agent specifies the technical solution → `details/{task}-details.md`
+4. **Prompt**: Agent creates the implementation prompt → `prompts/{task}-prompt.md`
+5. **Implementation**: Agent executes changes and logs them → `changes/{date}-{task}.md`
+
+## Rules
+
+- Agents must never modify files outside `.copilot-tracking/` unless in the **Implementation** phase.
+- All tasks must follow the ProSe Clean Hub standards.
diff --git a/.copilot-tracking/details/evidence-exporter-details.md b/.copilot-tracking/details/evidence-exporter-details.md
@@ -0,0 +1,54 @@
+# Details: Evidence Index Exporter (Task A1)
+**Date:** 2025-11-22
+**Based on Plan:** .copilot-tracking/plans/evidence-exporter-plan.md
+
+## 1. Schema Mapping
+### Evidence Item
+- `id` <- CSV `Evidence_ID`
+- `title` <- CSV `Title`
+- `category` <- CSV `Category`
+- `priority` <- CSV `Priority` (int)
+- `description` <- CSV `Description`
+- `sources` <- Calculated: `{"csv": True, "stickies": (id in sticky_ids), "timeline": (id in timeline_ids)}`
+
+### Sticky Item
+- `id` <- Generate (e.g. `STICKY-{i}`) or use existing if we had one (we don't in current format).
+- `evidence_id` <- `evidence_ids[0]` (Schema says `evidence_id` string? No, validator says list. Schema says `evidence_id` string in `stickies` item? Let's check schema. Schema says `evidence_id`: `string`. Wait, sticky_index.json has `evidence_ids` list. *Decision*: Create multiple sticky entries or pick primary? Schema `stickies` item has `evidence_id` (singular). We should likely flatten or duplicate? Or update schema? User provided schema has `evidence_id` (string). *Constraint*: For now, flatten: One sticky object per evidence_id referenced, or just take the first one? Better: Flatten. If a sticky refs 2 IDs, create 2 links? Or just change schema to array?
+*Correction*: Validator research says sticky_index.json has `evidence_ids` list. Schema `stickies` array item has `evidence_id` (string). To comply with schema without changing it, we will duplicate the sticky note for each evidence ID it references.*
+
+### Timeline Item
+- `id` <- Generate or from CSV? Timeline CSV has no ID column usually. We can generate `EVT-{date}-{index}`.
+- `evidence_ids` <- CSV `Evidence_IDs` (split by `;`).
+
+## 2. Logic Flow
+```python
+def export_evidence_index(base_path):
+    validation = validate_case(base_path)
+    if validation['status'] == 'WARN':
+        logger.warning("Validation issues found...")
+
+    csv_rows = read_csv(base_path / "Custody_Mod_Evidence.csv")
+    sticky_rows = read_json(base_path / "sticky_index.json")
+    timeline_rows = read_csv(base_path / "timeline.csv")
+
+    evidence_list = []
+    for row in csv_rows:
+        # Build evidence object
+        # Calculate sources
+        # Find related timeline events
+        evidence_list.append(obj)
+
+    final_json = {
+        "case_id": base_path.name,
+        "generated_at": datetime.now().isoformat(),
+        "evidence": evidence_list,
+        "stickies": transform_stickies(sticky_rows),
+        "timeline": transform_timeline(timeline_rows)
+    }
+
+    write_json(base_path / "evidence_index.json", final_json)
+    return final_json
+```
+
+## 3. File Impacts
+- `engine/agents/evidence_exporter.py`: Full implementation.
diff --git a/.copilot-tracking/plans/evidence-exporter-plan.md b/.copilot-tracking/plans/evidence-exporter-plan.md
@@ -0,0 +1,25 @@
+# Plan: Evidence Index Exporter (Task A1)
+**Date:** 2025-11-22
+**Based on Research:** .copilot-tracking/research/2025-11-22-evidence-exporter.md
+
+## 1. Goal
+Implement `export_evidence_index` to aggregate case data into a schema-compliant JSON file.
+
+## 2. Steps
+1. **Define Data Structures**: Create internal dataclasses or dict structures for the schema parts.
+2. **Implement Logic**:
+   - `load_csv_data()`: Returns list of evidence dicts.
+   - `load_sticky_data()`: Returns list of sticky dicts.
+   - `load_timeline_data()`: Returns list of timeline dicts.
+   - `export_evidence_index()`: Orchestrates loading, validation, and writing.
+3. **Integration**:
+   - Import `validate_case` from `evidence_validator.py`.
+   - Use validation result to warn user.
+4. **Validation**:
+   - Write to `case/evidence_index.json`.
+
+## 3. Verification
+- **Test**: `tests/core/test_evidence_exporter.py`
+  - Case 1: Full valid case (Synthetic). Verify JSON output has all fields.
+  - Case 2: Missing optional files. Verify output is valid but partial.
+  - Case 3: CSV missing. Verify `FileNotFoundError` or custom error.
diff --git a/.copilot-tracking/plans/evidence-validator-behavior-plan.md b/.copilot-tracking/plans/evidence-validator-behavior-plan.md
@@ -0,0 +1,22 @@
+# Plan: Evidence Index Enhancements (Task 1)
+**Date:** 2025-11-22
+**Based on Research:** .copilot-tracking/research/2025-11-22-evidence-validator-behavior.md
+
+## 1. Goal
+Implement a robust JSON Exporter for the Evidence Index that aggregates data from CSV, stickies, and timeline into a schema-compliant `evidence_index.json`.
+
+## 2. Steps
+1. **Design Exporter (Task 1.2)**
+   - Create detailed specs in `.copilot-tracking/details/exporter-details.md`.
+   - Define mappings from source columns to schema fields.
+2. **Implement Exporter (Task 1.3)**
+   - Create `engine/agents/evidence_index_exporter.py`.
+   - Implement `export_evidence_index(base_path)`.
+   - Include CLI support: `python -m engine.agents.evidence_index_exporter <path>`.
+3. **Test Exporter (Task 1.4)**
+   - Add tests in `tests/core/test_evidence_exporter.py`.
+   - Verify schema compliance using the sample schema.
+
+## 3. Verification
+- **Unit Tests**: Pass with 100% success on happy path and missing file cases.
+- **Manual Check**: Run exporter on `case/DivorceFiles` (mocked) and validate output against `case/evidence_index.schema.json`.
diff --git a/.copilot-tracking/plans/master-task-queue.md b/.copilot-tracking/plans/master-task-queue.md
@@ -0,0 +1,120 @@
+# Master Task Queue: ProSe Clean Hub
+
+**Status:** Active
+**Last Updated:** 2025-11-22
+
+## A. Evidence System Expansion (Primary Workstream)
+### A1. Implement Evidence Index Exporter
+- **File:** `engine/agents/evidence_exporter.py`
+- **Goal:** Convert CSV + Stickies + Timeline into a compliant `evidence_index.json`.
+- [ ] Write `export_evidence_index(base_path)` implementation.
+- [ ] Integrate `validate_case()` within exporter preflight.
+- [ ] Construct JSON matching `evidence_index.schema.json`.
+- [ ] Write to `case/evidence_index.json`.
+- [ ] Add error handling for missing fields.
+- [ ] Write happy-path and failure tests.
+
+### A2. Schema Validation Tests
+- **Files:** `tests/core/test_schema_validation.py`, `case/evidence_index.schema.json`
+- [ ] Load schema using `jsonschema` if available.
+- [ ] Validate synthetic example index.
+- [ ] Validate failure cases (missing IDs, wrong types, etc.).
+
+### A3. Synthetic Example Coverage
+- **Files:** `examples/synthetic_case/*`
+- [ ] Add test that validates synthetic example returns status “OK”.
+- [ ] Add test that exporter writes a valid index from synthetic example.
+- [ ] Add test ensuring schema prevents bad data.
+
+---
+
+## B. Core Engine Development
+### B1. Flesh Out `engine/core/engine.py`
+- [ ] Add class `Engine`.
+- [ ] Add plugin loader for agents.
+- [ ] Add logging scaffold.
+- [ ] Include reference to “Case Manager Mode”.
+- [ ] Add docstrings referencing `ENGINE_OVERVIEW.md`.
+
+### B2. Engine Overview Documentation
+- **File:** `docs/ENGINE_OVERVIEW.md`
+- [ ] Add diagrams (ASCII for now).
+- [ ] Document “agent lifecycle”.
+- [ ] Document “sync → validate → export → generate” loop.
+
+---
+
+## C. Repo Hygiene & Automation
+### C1. Expand `repo_clean.yml` (CI Cleanup Rules)
+- **Files:** `.github/workflows/repo_clean.yml`
+- [ ] Enforce presence of `docs` directory.
+- [ ] Validate `.copilot-tracking` structure.
+- [ ] Enforce schema presence.
+- [ ] Enforce no missing directories from README tree.
+- [ ] Block merge if `.pyc` or `__pycache__` are present.
+
+### C2. Add `scripts/health_report.py`
+- [ ] Count number of tests.
+- [ ] Count number of agents.
+- [ ] Count number of templates.
+- [ ] Output markdown to `docs/REPO_HEALTH.md`.
+- [ ] Add to Makefile as `just report`.
+
+### C3. Add formatting tooling
+- **Options:** Black, Ruff, isort
+- [ ] Add configuration to `pyproject.toml` or individual configs.
+- [ ] Add formatting commands to Makefile.
+- [ ] Add pre-commit hook file.
+
+---
+
+## D. Copilot AI Workflow System
+### D1. Validate Copilot Templates
+- **Files:** `.copilot-tracking/templates/*`, `.github/copilot/*`
+- [ ] Ensure templates contain no placeholders.
+- [ ] Add missing fields to researcher template.
+- [ ] Add example tasks to each role guide.
+- [ ] Add a “strict chain-of-custody” section to planner guide.
+- [ ] Add DO/DON'T blocks to `implementer.md`.
+
+### D2. Add Phase 3 Issue
+- **Create:** `.copilot-tracking/plans/phase-3-ai-integration.md`
+- [ ] Outline future tasks:
+  - Case Manager agent
+  - Motion generator agent
+  - Natural language timeline builder
+  - Evidence scoring model
+  - Hearing-prep packet builder
+
+---
+
+## E. Documentation Expansion
+### E1. Rewrite `MAINTENANCE.md`
+- [ ] Weekly / monthly / quarterly schedule
+- [ ] Script usage examples
+- [ ] Repo health KPIs
+- [ ] Evidence system KPIs
+
+### E2. Expand `INTEGRATION.md`
+- [ ] Mapping rules for ProSe Agent 2 donor imports
+- [ ] Mapping rules for PSFO donor imports
+- [ ] Donor import checklist
+- [ ] “Preflight Clean Hub Check”
+- [ ] Troubleshooting section
+
+---
+
+## F. Stretch Tasks (Optional but Powerful)
+### F1. Add sample case pack generator
+- [ ] A script that creates synthetic timeline, stickies, evidence list.
+- [ ] Useful for CI reproducibility.
+
+### F2. Add interactive CLI
+- [ ] Command: `prose validate my_case/`, `prose export my_case/`, `prose health`.
+
+---
+
+## G. Finalization Tasks
+- [ ] Regenerate README badges based on CI output.
+- [ ] Add versioning boilerplate to `CHANGELOG.md`.
+- [ ] Create 0.2.0 section in `CHANGELOG.md`.
diff --git a/.copilot-tracking/plans/phase-2-expansion-issue.md b/.copilot-tracking/plans/phase-2-expansion-issue.md
@@ -0,0 +1,52 @@
+# Phase 2: ProSe Clean Hub Expansion & AI Workflow Scaffolding
+
+This issue aggregates 10 scaffolding tasks to prepare the repository for full development.
+
+## 1. Normalize engine/ directory
+- [ ] Create `engine/core/`
+- [ ] Create `engine/core/engine.py` (empty stub)
+- [ ] Create `engine/core/README.md` (placeholder)
+
+## 2. Create Missing Docs
+Create placeholders for navigation:
+- [ ] `docs/INTEGRATION.md`
+- [ ] `docs/MAINTENANCE.md`
+- [ ] `docs/ENGINE_OVERVIEW.md`
+- [ ] `docs/ROADMAP.md`
+
+## 3. Add pytest configuration
+- [ ] Create `pytest.ini` with:
+  ```ini
+  [pytest]
+  python_files = test_*.py
+  python_paths = .
+  ```
+
+## 4. Add schema smoke test
+- [ ] Create `tests/core/test_schema_validation.py`
+  - Skip if jsonschema not installed.
+
+## 5. Add Makefile / Justfile
+- [ ] Create `Makefile` with targets: `test`, `audit`, `clean`.
+
+## 6. Create exporter stub
+- [ ] Create `engine/agents/evidence_exporter.py` with `NotImplementedError`.
+
+## 7. Create .copilot-tracking/ skeleton
+*(Done in previous task, but verify)*
+- [ ] Structure exists
+- [ ] Templates exist
+
+## 8. Add .github/copilot/
+- [ ] Create `.github/copilot/researcher.md`
+- [ ] Create `.github/copilot/planner.md`
+- [ ] Create `.github/copilot/implementer.md`
+
+## 9. Add synthetic example case
+- [ ] Create `examples/synthetic_case/`
+- [ ] Add dummy `Custody_Mod_Evidence.csv`
+- [ ] Add dummy `sticky_index.json`
+- [ ] Add dummy `timeline.csv`
+
+## 10. Add tests for synthetic example
+- [ ] Create `tests/core/test_synthetic_case.py` to validate the example case returns OK.
diff --git a/.copilot-tracking/prompts/evidence-exporter-prompt.md b/.copilot-tracking/prompts/evidence-exporter-prompt.md
@@ -0,0 +1,28 @@
+# Implementation Prompt: Evidence Index Exporter (Task A1)
+**Date:** 2025-11-22
+
+## Context
+Implement the `export_evidence_index` function in `engine/agents/evidence_exporter.py`. This function aggregates data from the CSV, Stickies, and Timeline files into a single JSON file (`case/evidence_index.json`) that strictly follows `case/evidence_index.schema.json`.
+
+## Instructions
+1.  **Modify** `engine/agents/evidence_exporter.py`:
+    -   Import `validate_case` from `engine.agents.evidence_validator`.
+    -   Implement `export_evidence_index(base_path)`.
+    -   Read `Custody_Mod_Evidence.csv`, `sticky_index.json`, `timeline.csv`.
+    -   Map fields as defined in the Details doc.
+    -   Handle data type conversions (priority -> int, date strings).
+    -   Flatten stickies (one entry per ID referenced).
+    -   Generate synthetic IDs for timeline events (`EVT-{date}-{i}`) and stickies (`STICKY-{i}`) if needed.
+    -   Write the result to `{base_path}/evidence_index.json`.
+    -   Return the dictionary.
+
+2.  **Create Test** `tests/core/test_evidence_exporter.py`:
+    -   Use `examples/synthetic_case` as test data (copy to tmp_path).
+    -   Run export.
+    -   Assert keys (`case_id`, `evidence`, `stickies`, `timeline`) exist.
+    -   Assert `evidence` count matches CSV count.
+
+## Constraints
+-   Use standard libraries (`csv`, `json`, `pathlib`, `datetime`).
+-   Do not depend on external `jsonschema` for the *export* logic itself (validation is separate).
+-   Ensure UTF-8 handling.
diff --git a/.copilot-tracking/research/2025-11-22-evidence-exporter.md b/.copilot-tracking/research/2025-11-22-evidence-exporter.md
@@ -0,0 +1,48 @@
+# Research: Evidence Index Exporter (Task A1)
+**Date:** 2025-11-22
+**Agent:** Jules
+**Status:** Complete
+
+## 1. Context
+We need to implement `engine/agents/evidence_exporter.py` to convert raw CSV/JSON/Timeline inputs into a canonical `evidence_index.json` compliant with `case/evidence_index.schema.json`.
+
+## 2. Analysis
+### Input Data Models
+- **CSV (`Custody_Mod_Evidence.csv`)**:
+  - Fields: `Evidence_ID`, `Title`, `Category`, `Priority`, `Description` (and others potentially).
+  - *Constraint*: This is the master list. If an ID isn't here, it's not in the index.
+- **Stickies (`sticky_index.json`)**:
+  - Array of objects with `evidence_ids` (list), `date`, `note`, `theme`, `priority`.
+  - *Mapping*: These map to the `stickies` array in the schema.
+- **Timeline (`timeline.csv`)**:
+  - Fields: `Date`, `Label`, `Category`, `Details`, `Evidence_IDs` (semicolon-separated).
+  - *Mapping*: These map to the `timeline` array in the schema.
+
+### Schema Requirements (`evidence_index.schema.json`)
+- Root: `case_id`, `generated_at`, `evidence` (list), `stickies` (list), `timeline` (list).
+- `evidence` item:
+  - Required: `id`, `title`, `category`.
+  - Optional: `priority`, `description`, `sources` (boolean flags), `timeline_events` (embedded summary?), `files` (file paths), `tags`.
+  - *Note*: The validator already checks for missing/unknown IDs. The exporter should use `validate_case` to determine the `sources` flags (e.g. `csv=True`, `stickies=True` if referenced).
+
+## 3. Implementation Strategy
+1. **Preflight**: Call `validate_case(base_path)`. If status is `WARN` (unknown IDs), decide whether to abort or proceed with warnings. *Decision*: Proceed but log warnings; unknown IDs in stickies/timeline will be dropped or flagged, but they can't be added to the main `evidence` list if not in CSV.
+2. **Extraction**:
+   - Read CSV -> Build `evidence` list skeleton.
+   - Read Stickies -> Build `stickies` list.
+   - Read Timeline -> Build `timeline` list.
+3. **Enrichment**:
+   - For each evidence item, calculate `sources` flags based on references.
+   - (Optional) Embed `timeline_events` directly into the evidence item for quick lookup? Schema has `timeline_events` in evidence item. Yes, we should populate this by cross-referencing the timeline.
+4. **Output**:
+   - Construct final dict.
+   - Write to `case/evidence_index.json`.
+
+## 4. Edge Cases
+- **Missing Files**: If CSV missing -> Error. If stickies/timeline missing -> Empty lists.
+- **Data Types**: Priority in CSV might be string "1" -> convert to int. Dates should be ISO 8601 string.
+
+## 5. Recommendations
+- Use standard `csv` and `json` libs.
+- Use `datetime.utcnow().isoformat()` for `generated_at`.
+- Ensure `utf-8` encoding.