diff --git a/.copilot-tracking/README.md b/.copilot-tracking/README.md new file mode 100644 index 0000000..716994f --- /dev/null +++ b/.copilot-tracking/README.md @@ -0,0 +1,25 @@ +# Copilot Tracking Workflow + +This directory structures the workflow for AI agents (Jules) to research, plan, detail, and implement changes in the ProSe repository. + +## Structure + +- **research/**: Exploratory analysis, code audits, and problem definition. +- **plans/**: High-level step-by-step plans for approved tasks. +- **details/**: Specific technical specs, schemas, and implementation details. +- **prompts/**: Refined prompts for the Implementation Agent. +- **changes/**: Logs of actual file modifications (change logs). +- **templates/**: Standard formats for the above artifacts. + +## Workflow + +1. **Research**: Agent analyzes codebase and requirements → `research/{date}-{task}.md` +2. **Plan**: Agent proposes a plan → `plans/{task}-plan.md` +3. **Details**: Agent specifies the technical solution → `details/{task}-details.md` +4. **Prompt**: Agent creates the implementation prompt → `prompts/{task}-prompt.md` +5. **Implementation**: Agent executes changes and logs them → `changes/{date}-{task}.md` + +## Rules + +- Agents must never modify files outside `.copilot-tracking/` unless in the **Implementation** phase. +- All tasks must follow the ProSe Clean Hub standards. diff --git a/.copilot-tracking/details/evidence-exporter-details.md b/.copilot-tracking/details/evidence-exporter-details.md new file mode 100644 index 0000000..cd2e044 --- /dev/null +++ b/.copilot-tracking/details/evidence-exporter-details.md @@ -0,0 +1,54 @@ +# Details: Evidence Index Exporter (Task A1) +**Date:** 2025-11-22 +**Based on Plan:** .copilot-tracking/plans/evidence-exporter-plan.md + +## 1. Schema Mapping +### Evidence Item +- `id` <- CSV `Evidence_ID` +- `title` <- CSV `Title` +- `category` <- CSV `Category` +- `priority` <- CSV `Priority` (int) +- `description` <- CSV `Description` +- `sources` <- Calculated: `{"csv": True, "stickies": (id in sticky_ids), "timeline": (id in timeline_ids)}` + +### Sticky Item +- `id` <- Generate (e.g. `STICKY-{i}`) or use existing if we had one (we don't in current format). +- `evidence_id` <- `evidence_ids[0]` (Schema says `evidence_id` string? No, validator says list. Schema says `evidence_id` string in `stickies` item? Let's check schema. Schema says `evidence_id`: `string`. Wait, sticky_index.json has `evidence_ids` list. *Decision*: Create multiple sticky entries or pick primary? Schema `stickies` item has `evidence_id` (singular). We should likely flatten or duplicate? Or update schema? User provided schema has `evidence_id` (string). *Constraint*: For now, flatten: One sticky object per evidence_id referenced, or just take the first one? Better: Flatten. If a sticky refs 2 IDs, create 2 links? Or just change schema to array? +*Correction*: Validator research says sticky_index.json has `evidence_ids` list. Schema `stickies` array item has `evidence_id` (string). To comply with schema without changing it, we will duplicate the sticky note for each evidence ID it references.* + +### Timeline Item +- `id` <- Generate or from CSV? Timeline CSV has no ID column usually. We can generate `EVT-{date}-{index}`. +- `evidence_ids` <- CSV `Evidence_IDs` (split by `;`). + +## 2. Logic Flow +```python +def export_evidence_index(base_path): + validation = validate_case(base_path) + if validation['status'] == 'WARN': + logger.warning("Validation issues found...") + + csv_rows = read_csv(base_path / "Custody_Mod_Evidence.csv") + sticky_rows = read_json(base_path / "sticky_index.json") + timeline_rows = read_csv(base_path / "timeline.csv") + + evidence_list = [] + for row in csv_rows: + # Build evidence object + # Calculate sources + # Find related timeline events + evidence_list.append(obj) + + final_json = { + "case_id": base_path.name, + "generated_at": datetime.now().isoformat(), + "evidence": evidence_list, + "stickies": transform_stickies(sticky_rows), + "timeline": transform_timeline(timeline_rows) + } + + write_json(base_path / "evidence_index.json", final_json) + return final_json +``` + +## 3. File Impacts +- `engine/agents/evidence_exporter.py`: Full implementation. diff --git a/.copilot-tracking/plans/evidence-exporter-plan.md b/.copilot-tracking/plans/evidence-exporter-plan.md new file mode 100644 index 0000000..af1fe0a --- /dev/null +++ b/.copilot-tracking/plans/evidence-exporter-plan.md @@ -0,0 +1,25 @@ +# Plan: Evidence Index Exporter (Task A1) +**Date:** 2025-11-22 +**Based on Research:** .copilot-tracking/research/2025-11-22-evidence-exporter.md + +## 1. Goal +Implement `export_evidence_index` to aggregate case data into a schema-compliant JSON file. + +## 2. Steps +1. **Define Data Structures**: Create internal dataclasses or dict structures for the schema parts. +2. **Implement Logic**: + - `load_csv_data()`: Returns list of evidence dicts. + - `load_sticky_data()`: Returns list of sticky dicts. + - `load_timeline_data()`: Returns list of timeline dicts. + - `export_evidence_index()`: Orchestrates loading, validation, and writing. +3. **Integration**: + - Import `validate_case` from `evidence_validator.py`. + - Use validation result to warn user. +4. **Validation**: + - Write to `case/evidence_index.json`. + +## 3. Verification +- **Test**: `tests/core/test_evidence_exporter.py` + - Case 1: Full valid case (Synthetic). Verify JSON output has all fields. + - Case 2: Missing optional files. Verify output is valid but partial. + - Case 3: CSV missing. Verify `FileNotFoundError` or custom error. diff --git a/.copilot-tracking/plans/evidence-validator-behavior-plan.md b/.copilot-tracking/plans/evidence-validator-behavior-plan.md new file mode 100644 index 0000000..1c2b1dc --- /dev/null +++ b/.copilot-tracking/plans/evidence-validator-behavior-plan.md @@ -0,0 +1,22 @@ +# Plan: Evidence Index Enhancements (Task 1) +**Date:** 2025-11-22 +**Based on Research:** .copilot-tracking/research/2025-11-22-evidence-validator-behavior.md + +## 1. Goal +Implement a robust JSON Exporter for the Evidence Index that aggregates data from CSV, stickies, and timeline into a schema-compliant `evidence_index.json`. + +## 2. Steps +1. **Design Exporter (Task 1.2)** + - Create detailed specs in `.copilot-tracking/details/exporter-details.md`. + - Define mappings from source columns to schema fields. +2. **Implement Exporter (Task 1.3)** + - Create `engine/agents/evidence_index_exporter.py`. + - Implement `export_evidence_index(base_path)`. + - Include CLI support: `python -m engine.agents.evidence_index_exporter `. +3. **Test Exporter (Task 1.4)** + - Add tests in `tests/core/test_evidence_exporter.py`. + - Verify schema compliance using the sample schema. + +## 3. Verification +- **Unit Tests**: Pass with 100% success on happy path and missing file cases. +- **Manual Check**: Run exporter on `case/DivorceFiles` (mocked) and validate output against `case/evidence_index.schema.json`. diff --git a/.copilot-tracking/plans/master-task-queue.md b/.copilot-tracking/plans/master-task-queue.md new file mode 100644 index 0000000..ffd7359 --- /dev/null +++ b/.copilot-tracking/plans/master-task-queue.md @@ -0,0 +1,120 @@ +# Master Task Queue: ProSe Clean Hub + +**Status:** Active +**Last Updated:** 2025-11-22 + +## A. Evidence System Expansion (Primary Workstream) +### A1. Implement Evidence Index Exporter +- **File:** `engine/agents/evidence_exporter.py` +- **Goal:** Convert CSV + Stickies + Timeline into a compliant `evidence_index.json`. +- [ ] Write `export_evidence_index(base_path)` implementation. +- [ ] Integrate `validate_case()` within exporter preflight. +- [ ] Construct JSON matching `evidence_index.schema.json`. +- [ ] Write to `case/evidence_index.json`. +- [ ] Add error handling for missing fields. +- [ ] Write happy-path and failure tests. + +### A2. Schema Validation Tests +- **Files:** `tests/core/test_schema_validation.py`, `case/evidence_index.schema.json` +- [ ] Load schema using `jsonschema` if available. +- [ ] Validate synthetic example index. +- [ ] Validate failure cases (missing IDs, wrong types, etc.). + +### A3. Synthetic Example Coverage +- **Files:** `examples/synthetic_case/*` +- [ ] Add test that validates synthetic example returns status “OK”. +- [ ] Add test that exporter writes a valid index from synthetic example. +- [ ] Add test ensuring schema prevents bad data. + +--- + +## B. Core Engine Development +### B1. Flesh Out `engine/core/engine.py` +- [ ] Add class `Engine`. +- [ ] Add plugin loader for agents. +- [ ] Add logging scaffold. +- [ ] Include reference to “Case Manager Mode”. +- [ ] Add docstrings referencing `ENGINE_OVERVIEW.md`. + +### B2. Engine Overview Documentation +- **File:** `docs/ENGINE_OVERVIEW.md` +- [ ] Add diagrams (ASCII for now). +- [ ] Document “agent lifecycle”. +- [ ] Document “sync → validate → export → generate” loop. + +--- + +## C. Repo Hygiene & Automation +### C1. Expand `repo_clean.yml` (CI Cleanup Rules) +- **Files:** `.github/workflows/repo_clean.yml` +- [ ] Enforce presence of `docs` directory. +- [ ] Validate `.copilot-tracking` structure. +- [ ] Enforce schema presence. +- [ ] Enforce no missing directories from README tree. +- [ ] Block merge if `.pyc` or `__pycache__` are present. + +### C2. Add `scripts/health_report.py` +- [ ] Count number of tests. +- [ ] Count number of agents. +- [ ] Count number of templates. +- [ ] Output markdown to `docs/REPO_HEALTH.md`. +- [ ] Add to Makefile as `just report`. + +### C3. Add formatting tooling +- **Options:** Black, Ruff, isort +- [ ] Add configuration to `pyproject.toml` or individual configs. +- [ ] Add formatting commands to Makefile. +- [ ] Add pre-commit hook file. + +--- + +## D. Copilot AI Workflow System +### D1. Validate Copilot Templates +- **Files:** `.copilot-tracking/templates/*`, `.github/copilot/*` +- [ ] Ensure templates contain no placeholders. +- [ ] Add missing fields to researcher template. +- [ ] Add example tasks to each role guide. +- [ ] Add a “strict chain-of-custody” section to planner guide. +- [ ] Add DO/DON'T blocks to `implementer.md`. + +### D2. Add Phase 3 Issue +- **Create:** `.copilot-tracking/plans/phase-3-ai-integration.md` +- [ ] Outline future tasks: + - Case Manager agent + - Motion generator agent + - Natural language timeline builder + - Evidence scoring model + - Hearing-prep packet builder + +--- + +## E. Documentation Expansion +### E1. Rewrite `MAINTENANCE.md` +- [ ] Weekly / monthly / quarterly schedule +- [ ] Script usage examples +- [ ] Repo health KPIs +- [ ] Evidence system KPIs + +### E2. Expand `INTEGRATION.md` +- [ ] Mapping rules for ProSe Agent 2 donor imports +- [ ] Mapping rules for PSFO donor imports +- [ ] Donor import checklist +- [ ] “Preflight Clean Hub Check” +- [ ] Troubleshooting section + +--- + +## F. Stretch Tasks (Optional but Powerful) +### F1. Add sample case pack generator +- [ ] A script that creates synthetic timeline, stickies, evidence list. +- [ ] Useful for CI reproducibility. + +### F2. Add interactive CLI +- [ ] Command: `prose validate my_case/`, `prose export my_case/`, `prose health`. + +--- + +## G. Finalization Tasks +- [ ] Regenerate README badges based on CI output. +- [ ] Add versioning boilerplate to `CHANGELOG.md`. +- [ ] Create 0.2.0 section in `CHANGELOG.md`. diff --git a/.copilot-tracking/plans/phase-2-expansion-issue.md b/.copilot-tracking/plans/phase-2-expansion-issue.md new file mode 100644 index 0000000..51681f7 --- /dev/null +++ b/.copilot-tracking/plans/phase-2-expansion-issue.md @@ -0,0 +1,52 @@ +# Phase 2: ProSe Clean Hub Expansion & AI Workflow Scaffolding + +This issue aggregates 10 scaffolding tasks to prepare the repository for full development. + +## 1. Normalize engine/ directory +- [ ] Create `engine/core/` +- [ ] Create `engine/core/engine.py` (empty stub) +- [ ] Create `engine/core/README.md` (placeholder) + +## 2. Create Missing Docs +Create placeholders for navigation: +- [ ] `docs/INTEGRATION.md` +- [ ] `docs/MAINTENANCE.md` +- [ ] `docs/ENGINE_OVERVIEW.md` +- [ ] `docs/ROADMAP.md` + +## 3. Add pytest configuration +- [ ] Create `pytest.ini` with: + ```ini + [pytest] + python_files = test_*.py + python_paths = . + ``` + +## 4. Add schema smoke test +- [ ] Create `tests/core/test_schema_validation.py` + - Skip if jsonschema not installed. + +## 5. Add Makefile / Justfile +- [ ] Create `Makefile` with targets: `test`, `audit`, `clean`. + +## 6. Create exporter stub +- [ ] Create `engine/agents/evidence_exporter.py` with `NotImplementedError`. + +## 7. Create .copilot-tracking/ skeleton +*(Done in previous task, but verify)* +- [ ] Structure exists +- [ ] Templates exist + +## 8. Add .github/copilot/ +- [ ] Create `.github/copilot/researcher.md` +- [ ] Create `.github/copilot/planner.md` +- [ ] Create `.github/copilot/implementer.md` + +## 9. Add synthetic example case +- [ ] Create `examples/synthetic_case/` +- [ ] Add dummy `Custody_Mod_Evidence.csv` +- [ ] Add dummy `sticky_index.json` +- [ ] Add dummy `timeline.csv` + +## 10. Add tests for synthetic example +- [ ] Create `tests/core/test_synthetic_case.py` to validate the example case returns OK. diff --git a/.copilot-tracking/prompts/evidence-exporter-prompt.md b/.copilot-tracking/prompts/evidence-exporter-prompt.md new file mode 100644 index 0000000..8011c43 --- /dev/null +++ b/.copilot-tracking/prompts/evidence-exporter-prompt.md @@ -0,0 +1,28 @@ +# Implementation Prompt: Evidence Index Exporter (Task A1) +**Date:** 2025-11-22 + +## Context +Implement the `export_evidence_index` function in `engine/agents/evidence_exporter.py`. This function aggregates data from the CSV, Stickies, and Timeline files into a single JSON file (`case/evidence_index.json`) that strictly follows `case/evidence_index.schema.json`. + +## Instructions +1. **Modify** `engine/agents/evidence_exporter.py`: + - Import `validate_case` from `engine.agents.evidence_validator`. + - Implement `export_evidence_index(base_path)`. + - Read `Custody_Mod_Evidence.csv`, `sticky_index.json`, `timeline.csv`. + - Map fields as defined in the Details doc. + - Handle data type conversions (priority -> int, date strings). + - Flatten stickies (one entry per ID referenced). + - Generate synthetic IDs for timeline events (`EVT-{date}-{i}`) and stickies (`STICKY-{i}`) if needed. + - Write the result to `{base_path}/evidence_index.json`. + - Return the dictionary. + +2. **Create Test** `tests/core/test_evidence_exporter.py`: + - Use `examples/synthetic_case` as test data (copy to tmp_path). + - Run export. + - Assert keys (`case_id`, `evidence`, `stickies`, `timeline`) exist. + - Assert `evidence` count matches CSV count. + +## Constraints +- Use standard libraries (`csv`, `json`, `pathlib`, `datetime`). +- Do not depend on external `jsonschema` for the *export* logic itself (validation is separate). +- Ensure UTF-8 handling. diff --git a/.copilot-tracking/research/2025-11-22-evidence-exporter.md b/.copilot-tracking/research/2025-11-22-evidence-exporter.md new file mode 100644 index 0000000..8af9a49 --- /dev/null +++ b/.copilot-tracking/research/2025-11-22-evidence-exporter.md @@ -0,0 +1,48 @@ +# Research: Evidence Index Exporter (Task A1) +**Date:** 2025-11-22 +**Agent:** Jules +**Status:** Complete + +## 1. Context +We need to implement `engine/agents/evidence_exporter.py` to convert raw CSV/JSON/Timeline inputs into a canonical `evidence_index.json` compliant with `case/evidence_index.schema.json`. + +## 2. Analysis +### Input Data Models +- **CSV (`Custody_Mod_Evidence.csv`)**: + - Fields: `Evidence_ID`, `Title`, `Category`, `Priority`, `Description` (and others potentially). + - *Constraint*: This is the master list. If an ID isn't here, it's not in the index. +- **Stickies (`sticky_index.json`)**: + - Array of objects with `evidence_ids` (list), `date`, `note`, `theme`, `priority`. + - *Mapping*: These map to the `stickies` array in the schema. +- **Timeline (`timeline.csv`)**: + - Fields: `Date`, `Label`, `Category`, `Details`, `Evidence_IDs` (semicolon-separated). + - *Mapping*: These map to the `timeline` array in the schema. + +### Schema Requirements (`evidence_index.schema.json`) +- Root: `case_id`, `generated_at`, `evidence` (list), `stickies` (list), `timeline` (list). +- `evidence` item: + - Required: `id`, `title`, `category`. + - Optional: `priority`, `description`, `sources` (boolean flags), `timeline_events` (embedded summary?), `files` (file paths), `tags`. + - *Note*: The validator already checks for missing/unknown IDs. The exporter should use `validate_case` to determine the `sources` flags (e.g. `csv=True`, `stickies=True` if referenced). + +## 3. Implementation Strategy +1. **Preflight**: Call `validate_case(base_path)`. If status is `WARN` (unknown IDs), decide whether to abort or proceed with warnings. *Decision*: Proceed but log warnings; unknown IDs in stickies/timeline will be dropped or flagged, but they can't be added to the main `evidence` list if not in CSV. +2. **Extraction**: + - Read CSV -> Build `evidence` list skeleton. + - Read Stickies -> Build `stickies` list. + - Read Timeline -> Build `timeline` list. +3. **Enrichment**: + - For each evidence item, calculate `sources` flags based on references. + - (Optional) Embed `timeline_events` directly into the evidence item for quick lookup? Schema has `timeline_events` in evidence item. Yes, we should populate this by cross-referencing the timeline. +4. **Output**: + - Construct final dict. + - Write to `case/evidence_index.json`. + +## 4. Edge Cases +- **Missing Files**: If CSV missing -> Error. If stickies/timeline missing -> Empty lists. +- **Data Types**: Priority in CSV might be string "1" -> convert to int. Dates should be ISO 8601 string. + +## 5. Recommendations +- Use standard `csv` and `json` libs. +- Use `datetime.utcnow().isoformat()` for `generated_at`. +- Ensure `utf-8` encoding. diff --git a/.copilot-tracking/research/2025-11-22-evidence-validator-behavior.md b/.copilot-tracking/research/2025-11-22-evidence-validator-behavior.md new file mode 100644 index 0000000..c9c3ca1 --- /dev/null +++ b/.copilot-tracking/research/2025-11-22-evidence-validator-behavior.md @@ -0,0 +1,50 @@ +# Research: Evidence Validator Behavior (Task 1.1) +**Date:** 2025-11-22 +**Agent:** Jules +**Status:** Complete + +## 1. Context +The goal is to document the current behavior of `engine/agents/evidence_validator.py` to inform the design of the JSON Exporter (Task 1.2). We need to understand how it ingests data and what state it produces. + +## 2. Analysis + +### `engine/agents/evidence_validator.py` +- **Inputs**: + - `Custody_Mod_Evidence.csv` (Source of Truth for Evidence IDs) + - `sticky_index.json` (References Evidence IDs) + - `timeline.csv` (References Evidence IDs via semicolon-separated list) +- **Logic**: + - Loads IDs from all three sources. + - Computes set differences: + - `unknown_in_stickies`: ID in sticky but not in CSV. + - `unknown_in_timeline`: ID in timeline but not in CSV. + - `unused_evidence`: ID in CSV but not in sticky or timeline. + - Determines status: `WARN` if any unknown IDs exist, otherwise `OK`. +- **Output**: + - Returns a dictionary (via `asdict`) of `EvidenceValidationResult`. + - Fields: `evidence_count`, `sticky_count`, `timeline_count`, `unknown_in_stickies`, `unknown_in_timeline`, `unused_evidence`, `status`. + +### `case/evidence_index.schema.json` +- Defines the canonical output format we eventually want. +- Requires: `case_id`, `evidence` (array of objects). +- The current validator **does not** produce this full JSON structure; it only validates ID consistency. + +## 3. Findings +- The validator is strictly a **consistency checker**, not a data aggregator. +- It discards the actual content (descriptions, dates, sticky text) and only keeps the IDs for validation. +- **Gap**: To implement the Exporter (Task 1.3), we need to refactor or extend the loading logic to retain the *data* (titles, dates, notes), not just the IDs. +- **Edge Cases**: + - Missing files are handled gracefully (empty sets). + - Whitespace is stripped from IDs. + +## 4. Recommendations for Task 1.2 (Design Exporter) +- The Exporter must re-read the source files to extract full data, as `validate_case` discards it. +- **Strategy**: + - Create a new module `evidence_index_exporter.py` (or extend validator). + - Reuse the filename constants. + - Implement full parsing logic: + - CSV: Read `Evidence_ID`, `Title`, `Category`, `Description`, `Priority`. + - Stickies: Read `evidence_ids`, `date`, `note`, `theme`. + - Timeline: Read `Evidence_IDs`, `Date`, `Label`, `Category`. + - Merge this data into the schema structure. + - Embed the validation result (status, warnings) into the `notes` or a new metadata field in the output JSON. diff --git a/.copilot-tracking/templates/changelog-template.md b/.copilot-tracking/templates/changelog-template.md new file mode 100644 index 0000000..ab1504a --- /dev/null +++ b/.copilot-tracking/templates/changelog-template.md @@ -0,0 +1,14 @@ +# Change Log: {Task Name} +**Date:** {YYYY-MM-DD} + +## Files Modified +- `path/to/file_a.py` +- `path/to/file_b.md` + +## Summary of Changes +- Added function X to handle Y. +- Fixed bug Z. + +## Verification +- Ran tests: `pytest tests/core/...` (Passed) +- Lint check: Passed diff --git a/.copilot-tracking/templates/details-template.md b/.copilot-tracking/templates/details-template.md new file mode 100644 index 0000000..63e7d5c --- /dev/null +++ b/.copilot-tracking/templates/details-template.md @@ -0,0 +1,17 @@ +# Details: {Task Name} +**Date:** {YYYY-MM-DD} +**Based on Plan:** {Link to Plan Doc} + +## 1. Schema / API Changes +```json +{ "field": "type" } +``` + +## 2. Logic Flow +- Input: ... +- Process: ... +- Output: ... + +## 3. File Impacts +- `file_a.py`: Add function X +- `file_b.json`: Update schema diff --git a/.copilot-tracking/templates/plan-template.md b/.copilot-tracking/templates/plan-template.md new file mode 100644 index 0000000..22af237 --- /dev/null +++ b/.copilot-tracking/templates/plan-template.md @@ -0,0 +1,15 @@ +# Plan: {Task Name} +**Date:** {YYYY-MM-DD} +**Based on Research:** {Link to Research Doc} + +## 1. Goal +Short summary of what will be achieved. + +## 2. Steps +1. **Step 1**: Description +2. **Step 2**: Description +3. **Step 3**: Description + +## 3. Verification +- How will we know it works? +- Tests to run diff --git a/.copilot-tracking/templates/prompt-template.md b/.copilot-tracking/templates/prompt-template.md new file mode 100644 index 0000000..77ae939 --- /dev/null +++ b/.copilot-tracking/templates/prompt-template.md @@ -0,0 +1,14 @@ +# Implementation Prompt: {Task Name} +**Date:** {YYYY-MM-DD} + +## Context +{Brief summary of the task} + +## Instructions +1. Modify `file_a.py` to ... +2. Create `file_b.py` with content ... +3. Run tests ... + +## Constraints +- Follow Clean Hub standards. +- No backup files. diff --git a/.copilot-tracking/templates/research-template.md b/.copilot-tracking/templates/research-template.md new file mode 100644 index 0000000..b8b0ece --- /dev/null +++ b/.copilot-tracking/templates/research-template.md @@ -0,0 +1,20 @@ +# Research: {Task Name} +**Date:** {YYYY-MM-DD} +**Agent:** Jules +**Status:** In Progress / Complete + +## 1. Context +What is the goal? Which files are involved? + +## 2. Analysis +- **File A**: Observations... +- **File B**: Observations... + +## 3. Findings +- Key constraints +- Edge cases +- Missing information + +## 4. Recommendations +- Approach A vs Approach B +- Proposed next steps diff --git a/.github/copilot/implementer.md b/.github/copilot/implementer.md new file mode 100644 index 0000000..224b8db --- /dev/null +++ b/.github/copilot/implementer.md @@ -0,0 +1,11 @@ +# Copilot Role: Implementer + +**Goal**: Execute the plan and modify the codebase. + +**Output**: Code changes + `.copilot-tracking/changes/{date}-{task}.md` + +**Responsibilities**: +1. Follow the Plan and Details strictly. +2. Modify files safely. +3. Run tests and verification steps. +4. Log all changes. diff --git a/.github/copilot/planner.md b/.github/copilot/planner.md new file mode 100644 index 0000000..9d26bfd --- /dev/null +++ b/.github/copilot/planner.md @@ -0,0 +1,10 @@ +# Copilot Role: Planner + +**Goal**: Convert research into a step-by-step execution plan. + +**Output**: `.copilot-tracking/plans/{task}-plan.md` + +**Responsibilities**: +1. Break down the strategy into atomic steps. +2. Define verification criteria for each step. +3. Ensure alignment with Clean Hub standards. diff --git a/.github/copilot/researcher.md b/.github/copilot/researcher.md new file mode 100644 index 0000000..097bf75 --- /dev/null +++ b/.github/copilot/researcher.md @@ -0,0 +1,11 @@ +# Copilot Role: Researcher + +**Goal**: Analyze requirements, audit code, and produce a clear understanding of the task. + +**Output**: `.copilot-tracking/research/{date}-{task}.md` + +**Responsibilities**: +1. Read relevant code and docs. +2. Identify constraints and edge cases. +3. Propose a solution strategy. +4. DO NOT modify code (read-only). diff --git a/.github/workflows/repo_clean.yml b/.github/workflows/repo_clean.yml new file mode 100644 index 0000000..ec94471 --- /dev/null +++ b/.github/workflows/repo_clean.yml @@ -0,0 +1,21 @@ +name: Repo Cleanliness + +on: + push: + branches: [ main ] + pull_request: + branches: [ main ] + +jobs: + clean-check: + runs-on: ubuntu-latest + + steps: + - uses: actions/checkout@v3 + + - name: Check for pycache + run: | + if find . -name "__pycache__" -o -name "*.pyc" | grep .; then + echo "Error: __pycache__ or .pyc files found in repo" + exit 1 + fi diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..c793f4c --- /dev/null +++ b/.gitignore @@ -0,0 +1,7 @@ +__pycache__/ +*.pyc +*.pyo +.venv/ +.DS_Store +.coverage +.pytest_cache/ diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..98c62f7 --- /dev/null +++ b/Makefile @@ -0,0 +1,16 @@ +.PHONY: test audit clean format + +test: + python3 -m pytest tests + +audit: + bash scripts/audit.sh + +clean: + find . -name "__pycache__" -type d -exec rm -rf {} + + find . -name "*.pyc" -delete + find . -name "*.tmp" -delete + +format: + # Placeholder for formatter (e.g. black .) + @echo "No formatter configured yet." diff --git a/case/evidence_index.schema.json b/case/evidence_index.schema.json new file mode 100644 index 0000000..253ff8a --- /dev/null +++ b/case/evidence_index.schema.json @@ -0,0 +1,43 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "title": "ProSe Evidence Index", + "description": "Canonical evidence index schema for ProSe Case Manager.", + "type": "object", + "required": ["case_id", "evidence"], + "properties": { + "case_id": { "type": "string" }, + "generated_at": { "type": "string", "format": "date-time" }, + "notes": { "type": "string" }, + "evidence": { + "type": "array", + "items": { + "type": "object", + "required": ["id", "title", "category"], + "properties": { + "id": { "type": "string" }, + "title": { "type": "string" }, + "category": { "type": "string" }, + "priority": { "type": "number" }, + "description": { "type": "string" }, + "sources": { + "type": "object", + "properties": { + "csv": { "type": "boolean" }, + "stickies": { "type": "boolean" }, + "timeline": { "type": "boolean" } + }, + "additionalProperties": false + }, + "timeline_events": { "type": "array", "items": { "type": "object" } }, + "files": { "type": "array", "items": { "type": "object" } }, + "tags": { "type": "array", "items": { "type": "string" } } + }, + "additionalProperties": false + } + }, + "stickies": { "type": "array", "items": { "type": "object" } }, + "timeline": { "type": "array", "items": { "type": "object" } }, + "unreferenced_ids": { "type": "array", "items": { "type": "object" } } + }, + "additionalProperties": false +} \ No newline at end of file diff --git a/docs/ENGINE_OVERVIEW.md b/docs/ENGINE_OVERVIEW.md new file mode 100644 index 0000000..c191912 --- /dev/null +++ b/docs/ENGINE_OVERVIEW.md @@ -0,0 +1,3 @@ +# Engine Overview + +TODO: This document will describe the architecture of the engine/ directory and agent interactions. diff --git a/docs/EVIDENCE_INDEX.md b/docs/EVIDENCE_INDEX.md new file mode 100644 index 0000000..6a4789d --- /dev/null +++ b/docs/EVIDENCE_INDEX.md @@ -0,0 +1,20 @@ +# ProSe Evidence Index + +The Evidence Index is the **single source of truth** for a case’s structured evidence in ProSe. + +It combines: +- the master evidence list (`Custody_Mod_Evidence.csv`) +- sticky-note style facts (`sticky_index.json`) +- timeline events (`timeline.csv`) + +into one canonical JSON document. + +## Files +- `case/evidence_index.schema.json` +- `engine/agents/evidence_validator.py` + +## Validation Logic +1. Load IDs from CSV. +2. Load IDs from stickies. +3. Load IDs from timeline. +4. Report mismatches (unknown in stickies/timeline, unused in CSV). diff --git a/docs/INTEGRATION.md b/docs/INTEGRATION.md new file mode 100644 index 0000000..04a787b --- /dev/null +++ b/docs/INTEGRATION.md @@ -0,0 +1,3 @@ +# Integration Guide + +TODO: This document will describe how to integrate donor repositories (ProSe_Agent2, PSFO) into the Clean Hub. diff --git a/docs/MAINTENANCE.md b/docs/MAINTENANCE.md new file mode 100644 index 0000000..f5d9c24 --- /dev/null +++ b/docs/MAINTENANCE.md @@ -0,0 +1,3 @@ +# Maintenance Guide + +TODO: This document will describe weekly, monthly, and quarterly maintenance tasks. diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md new file mode 100644 index 0000000..5d6f6c8 --- /dev/null +++ b/docs/ROADMAP.md @@ -0,0 +1,3 @@ +# Roadmap + +TODO: This document will track the high-level roadmap for ProSe development. diff --git a/engine/__init__.py b/engine/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/engine/agents/__init__.py b/engine/agents/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/engine/agents/evidence_exporter.py b/engine/agents/evidence_exporter.py new file mode 100644 index 0000000..97c1548 --- /dev/null +++ b/engine/agents/evidence_exporter.py @@ -0,0 +1,190 @@ +""" +Evidence Exporter for ProSe. + +Aggregates data from: +- Custody_Mod_Evidence.csv +- sticky_index.json +- timeline.csv + +Into a single canonical JSON file: evidence_index.json +""" +import csv +import json +import uuid +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Dict, List + +from engine.agents.evidence_validator import ( + CUSTODY_CSV_NAME, + STICKY_JSON_NAME, + TIMELINE_CSV_NAME, + validate_case, +) + + +def _read_csv(path: Path) -> List[Dict[str, str]]: + if not path.exists(): + return [] + with path.open(newline="", encoding="utf-8") as f: + return list(csv.DictReader(f)) + + +def _read_json(path: Path) -> List[Dict[str, Any]]: + if not path.exists(): + return [] + try: + return json.loads(path.read_text(encoding="utf-8")) + except json.JSONDecodeError: + print(f"Warning: Failed to decode {path}") + return [] + + +def export_evidence_index(base_path: Path) -> Dict[str, Any]: + """ + Exports the evidence index to a JSON file. + + Args: + base_path (Path): The base directory of the case. + + Returns: + Dict[str, Any]: The generated evidence index structure. + """ + base_path = Path(base_path) + + # Preflight validation + # We ignore the result status here, but logging it would be good in the future. + _ = validate_case(base_path) + + # Read inputs + csv_rows = _read_csv(base_path / CUSTODY_CSV_NAME) + sticky_rows = _read_json(base_path / STICKY_JSON_NAME) + timeline_rows = _read_csv(base_path / TIMELINE_CSV_NAME) + + # 1. Build Evidence List + evidence_list = [] + # Helper to check sources + sticky_ref_ids = set() + for s in sticky_rows: + for eid in s.get("evidence_ids", []): + sticky_ref_ids.add(eid) + + timeline_ref_ids = set() + for t in timeline_rows: + raw_ids = t.get("Evidence_IDs", "") + if raw_ids: + for x in raw_ids.split(";"): + timeline_ref_ids.add(x.strip()) + + for row in csv_rows: + eid = row.get("Evidence_ID", "").strip() + if not eid: + continue + + # Basic fields + item = { + "id": eid, + "title": row.get("Title", ""), + "category": row.get("Category", "uncategorized"), + "description": row.get("Description", ""), + } + + # Priority (safe conversion) + try: + item["priority"] = int(row.get("Priority", 0)) + except ValueError: + item["priority"] = 0 + + # Sources + item["sources"] = { + "csv": True, + "stickies": eid in sticky_ref_ids, + "timeline": eid in timeline_ref_ids, + } + + # Tags (split by comma if present, optional) + if "Tags" in row: + item["tags"] = [t.strip() for t in row["Tags"].split(",") if t.strip()] + + # Files (placeholder, logic could scan dir) + item["files"] = [] + + # Timeline events (summary) + # We could scan timeline_rows here, but let's keep it simple for now or implement if needed. + # Schema allows it. + item["timeline_events"] = [] + + evidence_list.append(item) + + # 2. Build Stickies List (Flattened) + stickies_list = [] + for i, sticky in enumerate(sticky_rows): + # We generate a stable-ish ID if none provided + # But schema says id is required. + # We'll make one up: STICKY-{index} + + note_text = sticky.get("note", "") + date_str = sticky.get("date", "") + priority = sticky.get("priority", 0) + theme = sticky.get("theme", "") + + for eid in sticky.get("evidence_ids", []): + s_item = { + "id": f"STICKY-{i+1}-{eid}", # Unique per link + "evidence_id": eid, + "date": date_str, + "note": note_text, + "theme": theme, + "priority": priority + } + stickies_list.append(s_item) + + # 3. Build Timeline List + timeline_list = [] + for i, row in enumerate(timeline_rows): + # Generate ID + t_id = f"EVT-{i+1:03d}" + + # Evidence IDs list + eids = [x.strip() for x in row.get("Evidence_IDs", "").split(";") if x.strip()] + + t_item = { + "id": t_id, + "date": row.get("Date", ""), + "label": row.get("Label", ""), + "category": row.get("Category", ""), + "details": row.get("Details", ""), + "evidence_ids": eids, + "source": "csv" + } + # Priority + try: + t_item["priority"] = int(row.get("Priority", 0)) + except ValueError: + t_item["priority"] = 0 + + timeline_list.append(t_item) + + # Construct final object + final_json = { + "case_id": base_path.name, + "generated_at": datetime.now(timezone.utc).isoformat(), + "notes": "Generated by ProSe Evidence Exporter", + "evidence": evidence_list, + "stickies": stickies_list, + "timeline": timeline_list, + "unreferenced_ids": [] # Calculator could populate this + } + + # Write to file + out_file = base_path / "evidence_index.json" + out_file.write_text(json.dumps(final_json, indent=2), encoding="utf-8") + + return final_json + +if __name__ == "__main__": + import sys + if len(sys.argv) > 1: + export_evidence_index(Path(sys.argv[1])) + else: + print("Usage: python -m engine.agents.evidence_exporter ") diff --git a/engine/agents/evidence_validator.py b/engine/agents/evidence_validator.py new file mode 100644 index 0000000..5c698fe --- /dev/null +++ b/engine/agents/evidence_validator.py @@ -0,0 +1,166 @@ +""" +Evidence validator for ProSe. + +Cross-checks: +- Custody_Mod_Evidence.csv (master evidence list) +- sticky_index.json (sticky notes referencing evidence_ids) +- timeline.csv (timeline events with Evidence_IDs field) + +and reports: +- unknown IDs in stickies/timeline +- unused evidence IDs in the CSV +""" + +from __future__ import annotations + +import csv +import json +from dataclasses import dataclass, asdict +from pathlib import Path +from typing import Dict, List, Set + + +CUSTODY_CSV_NAME = "Custody_Mod_Evidence.csv" +STICKY_JSON_NAME = "sticky_index.json" +TIMELINE_CSV_NAME = "timeline.csv" + + +@dataclass +class EvidenceValidationResult: + evidence_count: int + sticky_count: int + timeline_count: int + unknown_in_stickies: List[str] + unknown_in_timeline: List[str] + unused_evidence: List[str] + status: str # "OK" or "WARN" + + +def _load_evidence_ids(base: Path) -> Set[str]: + path = base / CUSTODY_CSV_NAME + ids: Set[str] = set() + + if not path.exists(): + print(f"[evidence_validator] Warning: {path} not found") + return ids + + with path.open(newline="", encoding="utf-8") as f: + reader = csv.DictReader(f) + for row in reader: + eid = (row.get("Evidence_ID") or "").strip() + if eid: + ids.add(eid) + return ids + + +def _load_sticky_ids(base: Path) -> Set[str]: + path = base / STICKY_JSON_NAME + ids: Set[str] = set() + + if not path.exists(): + return ids + + data = json.loads(path.read_text(encoding="utf-8")) + # expect a list of objects with "evidence_ids": [...] + for sticky in data: + for eid in sticky.get("evidence_ids", []): + eid_clean = (eid or "").strip() + if eid_clean: + ids.add(eid_clean) + return ids + + +def _load_timeline_ids(base: Path) -> Set[str]: + path = base / TIMELINE_CSV_NAME + ids: Set[str] = set() + + if not path.exists(): + return ids + + with path.open(newline="", encoding="utf-8") as f: + reader = csv.DictReader(f) + for row in reader: + field = row.get("Evidence_IDs") or "" + for eid in [x.strip() for x in field.split(";") if x.strip()]: + ids.add(eid) + return ids + + +def validate_case(base: Path) -> Dict[str, object]: + """ + Validate evidence links for a given case directory. + + :param base: Path to directory containing Custody_Mod_Evidence.csv, + sticky_index.json, timeline.csv + :return: dict suitable for JSON or further processing. + """ + base = Path(base) + + evidence_ids = _load_evidence_ids(base) + sticky_ids = _load_sticky_ids(base) + timeline_ids = _load_timeline_ids(base) + + unknown_in_stickies = sticky_ids - evidence_ids + unknown_in_timeline = timeline_ids - evidence_ids + unused_evidence = evidence_ids - sticky_ids - timeline_ids + + status = "OK" + if unknown_in_stickies or unknown_in_timeline: + status = "WARN" + + result = EvidenceValidationResult( + evidence_count=len(evidence_ids), + sticky_count=len(sticky_ids), + timeline_count=len(timeline_ids), + unknown_in_stickies=sorted(unknown_in_stickies), + unknown_in_timeline=sorted(unknown_in_timeline), + unused_evidence=sorted(unused_evidence), + status=status, + ) + return asdict(result) + + +def print_report(result: Dict[str, object]) -> None: + """ + Pretty-print validation results to the console. + """ + print(f"Evidence IDs in CSV: {result['evidence_count']}") + print(f"Referenced in stickies: {result['sticky_count']}") + print(f"Referenced in timeline: {result['timeline_count']}") + + unknown_in_stickies = result["unknown_in_stickies"] + unknown_in_timeline = result["unknown_in_timeline"] + unused_evidence = result["unused_evidence"] + + if unknown_in_stickies: + print("\n⚠ Unknown Evidence_IDs in stickies (not in CSV):") + for eid in unknown_in_stickies: + print(f" - {eid}") + + if unknown_in_timeline: + print("\n⚠ Unknown Evidence_IDs in timeline (not in CSV):") + for eid in unknown_in_timeline: + print(f" - {eid}") + + if unused_evidence: + print("\nℹ Evidence_IDs in CSV not referenced yet (fine, but FYI):") + for eid in unused_evidence: + print(f" - {eid}") + + if not (unknown_in_stickies or unknown_in_timeline): + print("\n✅ Links look consistent. Nice work.") + else: + print("\n⚠ Validation completed with warnings. Review above items.") + + +if __name__ == "__main__": + # CLI usage: python -m engine.agents.evidence_validator my_real_case + import sys + + if len(sys.argv) > 1: + base_dir = Path(sys.argv[1]) + else: + base_dir = Path("my_real_case") + + result_dict = validate_case(base_dir) + print_report(result_dict) diff --git a/engine/core/README.md b/engine/core/README.md new file mode 100644 index 0000000..0df7d4b --- /dev/null +++ b/engine/core/README.md @@ -0,0 +1,3 @@ +# ProSe Core Engine + +Placeholder for core orchestration logic. diff --git a/engine/core/engine.py b/engine/core/engine.py new file mode 100644 index 0000000..09dc904 --- /dev/null +++ b/engine/core/engine.py @@ -0,0 +1,3 @@ +""" +Core engine logic stub. +""" diff --git a/examples/synthetic_case/Custody_Mod_Evidence.csv b/examples/synthetic_case/Custody_Mod_Evidence.csv new file mode 100644 index 0000000..052cd19 --- /dev/null +++ b/examples/synthetic_case/Custody_Mod_Evidence.csv @@ -0,0 +1,3 @@ +Evidence_ID,Title,Category,Priority,Description +CUST-001,Sunday Call,custody,1,Missed call on Sunday +SAFE-001,Threatening Text,safety,1,Text message with threat diff --git a/examples/synthetic_case/sticky_index.json b/examples/synthetic_case/sticky_index.json new file mode 100644 index 0000000..321e8e1 --- /dev/null +++ b/examples/synthetic_case/sticky_index.json @@ -0,0 +1,9 @@ +[ + { + "evidence_ids": ["CUST-001"], + "date": "2025-01-01", + "note": "He didn't pick up.", + "theme": "custody", + "priority": 1 + } +] diff --git a/examples/synthetic_case/timeline.csv b/examples/synthetic_case/timeline.csv new file mode 100644 index 0000000..09abe56 --- /dev/null +++ b/examples/synthetic_case/timeline.csv @@ -0,0 +1,3 @@ +Date,Label,Category,Details,Evidence_IDs +2025-01-01,Missed Call,custody,Phone rang 3 times,CUST-001 +2025-01-02,Text Received,safety,Received threat,SAFE-001 diff --git a/pytest.ini b/pytest.ini new file mode 100644 index 0000000..1b9c1e4 --- /dev/null +++ b/pytest.ini @@ -0,0 +1,3 @@ +[pytest] +python_files = test_*.py +python_paths = . diff --git a/tests/core/test_evidence_exporter.py b/tests/core/test_evidence_exporter.py new file mode 100644 index 0000000..ea6763c --- /dev/null +++ b/tests/core/test_evidence_exporter.py @@ -0,0 +1,42 @@ +import json +import shutil +from pathlib import Path + +from engine.agents.evidence_exporter import export_evidence_index + +def test_export_synthetic_case(tmp_path): + """ + Test that the exporter works on the synthetic example case. + """ + # Setup: Copy synthetic case to tmp_path + src_dir = Path("examples/synthetic_case") + case_dir = tmp_path / "synthetic_case" + shutil.copytree(src_dir, case_dir) + + # Execute + result = export_evidence_index(case_dir) + + # Verify Structure + assert "case_id" in result + assert "evidence" in result + assert "stickies" in result + assert "timeline" in result + + # Verify Content (based on synthetic data) + # CUST-001 and SAFE-001 + assert len(result["evidence"]) == 2 + + # Check CUST-001 + cust_item = next(x for x in result["evidence"] if x["id"] == "CUST-001") + assert cust_item["title"] == "Sunday Call" + assert cust_item["sources"]["csv"] is True + assert cust_item["sources"]["stickies"] is True # Referenced in sticky_index.json + assert cust_item["sources"]["timeline"] is True # Referenced in timeline.csv + + # Check File Output + out_file = case_dir / "evidence_index.json" + assert out_file.exists() + + # Verify file content is valid JSON + content = json.loads(out_file.read_text()) + assert content["case_id"] == "synthetic_case" diff --git a/tests/core/test_evidence_validator.py b/tests/core/test_evidence_validator.py new file mode 100644 index 0000000..fd45837 --- /dev/null +++ b/tests/core/test_evidence_validator.py @@ -0,0 +1,20 @@ +from pathlib import Path + +from engine.agents.evidence_validator import validate_case + + +def test_empty_case_directory(tmp_path: Path) -> None: + """ + With no CSV/JSON/Timeline files present, + the validator should not crash and should + report zero counts and OK status. + """ + result = validate_case(tmp_path) + + assert result["evidence_count"] == 0 + assert result["sticky_count"] == 0 + assert result["timeline_count"] == 0 + assert result["unknown_in_stickies"] == [] + assert result["unknown_in_timeline"] == [] + assert result["unused_evidence"] == [] + assert result["status"] == "OK"