Skip to content

docs: overnight Apr 26 research bundle (6 files)#73

Open
TombStoneDash wants to merge 4 commits into
claude/sample-addresses-per-city-018Sy3dmmPibwGHSBHMkh6cQfrom
docs/overnight-apr26-research
Open

docs: overnight Apr 26 research bundle (6 files)#73
TombStoneDash wants to merge 4 commits into
claude/sample-addresses-per-city-018Sy3dmmPibwGHSBHMkh6cQfrom
docs/overnight-apr26-research

Conversation

@TombStoneDash
Copy link
Copy Markdown
Owner

Summary

Pivoted overnight Phase A (ReCollect imports) to Step 6 fallback work. 6 read-only audit/research files added. Zero code changes, zero schema changes, zero DB writes.

Files

  • SOURCE_REGISTRY_AUDIT_APR26.md — 108 distinct city slugs in schedule_reports, slug-duplication clusters, recommendation to create the SOURCE_REGISTRY.md file (it doesn't exist anywhere in either repo).
  • BLOCKER_ANALYSIS_APR26.mdcollection_zones migration is already applied to prod; only the zone-polygon imports remain. BLOCKER.md is misleading and needs an update.
  • CITY_GAPS_UPDATE_APR26.md — paste-ready edits to remove Detroit + Tampa (verified covered via city ArcGIS in PR Link addresses to pickup zones and days #9). Documented as a patch, not a competing PR, because CITY_GAPS.md is on HT's in-progress branch.
  • RECOLLECT_RESEARCH_APR26.md — public ReCollect API has no enumeration endpoint; the existing 16 tuples in scripts/import-recollect.mjs are all already imported.
  • RECOLLECT_DISCOVERY_PLAN.md — paste-ready script for HT to scrape the HACS waste-collection-schedule repo for ~70-130 new candidate tuples. Not committed live.
  • QUEUE_TRIAGE_APR26.md — 43 queue items classified; ~17 stale, ~6 out-of-scope, 4 NEEDS-EYES.

Why merge to the default branch (claude/sample-addresses-...)

That's the actual trunk on this repo (origin/HEAD). Long branch name is incidental.

Why ReCollect imports were skipped

All 16 known (place_id, service_id) tuples are already in DB. Discovering new ones requires either crawling city websites (slow) or partner credentials (not available). Documented in detail in RECOLLECT_RESEARCH_APR26.md. HT confirmed pivot to Step 6.

Test plan

  • All files render — markdown only
  • No code or schema changes
  • No DB writes performed during research
  • HT review of NEEDS-EYES items in queue triage
  • HT decides whether to update BLOCKER.md and CITY_GAPS.md per the recommendations

🤖 Generated with Claude Code

TombStoneDash and others added 4 commits April 26, 2026 20:18
Pivoted from ReCollect Batch 1 imports (existing 16 tuples already in
DB, no public discovery endpoint) to Step 6 fallback work. All
documents are read-only audits or paste-ready patches; nothing in here
modifies code, schema, or row data.

Files:
- SOURCE_REGISTRY_AUDIT_APR26.md
    108 distinct city slugs in schedule_reports. SOURCE_REGISTRY.md
    doesn't exist anywhere — recommends creating it. Flags slug
    duplication clusters (austin/austin-tx, denver/denver-co, etc.).
- BLOCKER_ANALYSIS_APR26.md
    The collection_zones migration BLOCKER.md flags as "MANUAL
    APPLICATION REQUIRED" has actually been applied to prod. Table
    exists, lookup_zone RPC works, only the zone polygons remain
    un-imported. BLOCKER.md needs an update.
- CITY_GAPS_UPDATE_APR26.md
    Paste-ready edits to remove Detroit + Tampa from CITY_GAPS.md
    (verified covered via *_arcgis in PR #9). Documented as a patch
    rather than a competing PR because CITY_GAPS.md only lives on
    HT's in-progress feat/phase-c-durham-gaps branch.
- RECOLLECT_RESEARCH_APR26.md
    ReCollect public API surface is too narrow for new-city imports
    without prior knowledge of (place_id, service_id) tuples. All 16
    in scripts/import-recollect.mjs already imported. Hard-stop check
    for Step 3 documented.
- RECOLLECT_DISCOVERY_PLAN.md
    Discovery script HT can run during waking hours to surface ~70-
    130 new candidate tuples from the HACS waste-collection-schedule
    repo. NOT executed — script is paste-ready, not committed live.
- QUEUE_TRIAGE_APR26.md
    43 items in factory/queue/ classified. ~17 stale apr-12/13
    sprints, ~6 out-of-scope (Daisy/LIMS/BotCaptcha/etc.), 4
    NEEDS-EYES, the rest done/reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raft, discovery script, import audit)

Bundle of follow-up work flagged in the apr-26 overnight bundle (PR #73):

- BLOCKER.md: collection_zones migration is APPLIED in prod, not pending.
  Reframes the readiness-sprint section accordingly. Also reconciles the
  Louisville/Pittsburgh contradiction (RESOLVED ↔ DEAD ENDPOINT in the
  same file) — DEAD is the live state.

- migrations/2026XX_normalize_city_slugs.sql.draft: SQL to consolidate
  duplicate city slugs surfaced by SOURCE_REGISTRY_AUDIT_APR26. Part 1
  is the safe ReCollect-placeholder consolidation (austin-tx → austin,
  etc., 4 rows touched total). Parts 2 & 3 flagged as needing HT
  decisions before execution. Filename suffix `.sql.draft` keeps it out
  of CI globs.

- scripts/discover-recollect-tuples.mjs: runnable script that scrapes
  the HACS waste-collection-schedule repo for ~70-130 candidate
  (place_id, service_id) tuples beyond the 16 already imported.
  NOT executed — promoted from RECOLLECT_DISCOVERY_PLAN.md as a
  reviewable file that HT can run during waking hours.

- IMPORT_AUDIT_APR27.md: 35 import-*.mjs scripts × claimed-vs-actual
  data shape. 28 produced data, 5 are empty/blocked, 13 store zone
  descriptors as synthetic 'address' strings (the structural mismatch
  the Long Beach PoC tonight is testing a fix for).

Zero DB writes in this commit. schedule_reports row count snapshot:
7,300,730 unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-stage recycling-day schema fix directive halted at Stage 1 per the
"Case C/D = halt" rule. Findings:

1. Source layer (services6.arcgis.com/yCArG7wGXGyWLqav/.../
   Refuse_Collection_Days/FeatureServer/0) has only ONE day field
   (`DAY` esriFieldTypeString). No sibling service in the Long Beach
   catalog carries separate recycling-day or yard-waste-day data.

2. The "Wednesday, Saturday" production output that motivated this
   fix is a parsing bug in src/lib/city-geocode.ts:135-172
   (`normalizeDay`), not real recycling-day data. DAY_MAP only has
   abbreviations; full-word inputs fall through to the multi-day
   decoder, which greedy-matches abbreviations within the word.
   WEDNESDAY/THURSDAY/SATURDAY all decode wrong (3 of 7 days).
   M/T/F/Su decode correctly. Reproduced locally.

3. Long Beach has same-day refuse + recycling per the existing code
   assumption (recyclingDay: day at line 1507). No separate recycling
   data exists in any discoverable source for this city.

Per directive: do not design schema speculatively for Case D. Stage 2
not executed. No code changes, no migration, no worktree on
trashalert-web. collection_zones row count remains 19,
schedule_reports remains 7,300,730.

Recommendations for HT:

  Branch 1 (highest leverage): fix normalizeDay bug. Real bug
  affecting 3 of 7 days for cities returning full-word day values.
  ~5 line fix.

  Branch 2: generalize collection_zones schema later, alongside the
  Tampa import — Tampa has 3 actual separate layers (trash/recycling/
  yard-waste per src/lib/city-geocode.ts:1515) and is the driver for
  the schema generalization, not Long Beach.

  Branch 3: hunt for separate Long Beach recycling source if HT
  confirms LB has non-same-day collection (no evidence for this yet).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three read-only audit docs from the long-running zone-expansion
session. Companion to the per-city PRs shipped today (PR #14
Houston, PR #15 Step 0.5 architecture, PR #16 Baltimore).

- IMPORT_DRIFT_AUDIT: all 16 ReCollect tuples healthy, 4 entries
  show city-name drift in the API response (halton-on→burlington,
  saanich-bc→victoria, hardin-id→boise, king-county-wa→des-moines)
  but these are naming differences for service regions, not bugs.
  No action needed.

- SLUG_DUPS: 7 duplicate clusters (unchanged from Apr 26 audit).
  5 are safe ReCollect-placeholder mismatches; 2 (NYC, Portland)
  need HT decisions. Migration draft already exists at
  migrations/2026XX_normalize_city_slugs.sql.draft.

- COVERAGE_HEATMAP: 108 cities, sorted by row count, source-label
  sample, last-updated, zone-priority status. 4 cities now in the
  collection_zones priority set (long-beach, indianapolis, houston,
  baltimore — all live in prod after this session's PRs).

Zero DB writes from these audits. Pure observation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant