docs: overnight Apr 26 research bundle (6 files)#73
Open
TombStoneDash wants to merge 4 commits into
Open
Conversation
Pivoted from ReCollect Batch 1 imports (existing 16 tuples already in
DB, no public discovery endpoint) to Step 6 fallback work. All
documents are read-only audits or paste-ready patches; nothing in here
modifies code, schema, or row data.
Files:
- SOURCE_REGISTRY_AUDIT_APR26.md
108 distinct city slugs in schedule_reports. SOURCE_REGISTRY.md
doesn't exist anywhere — recommends creating it. Flags slug
duplication clusters (austin/austin-tx, denver/denver-co, etc.).
- BLOCKER_ANALYSIS_APR26.md
The collection_zones migration BLOCKER.md flags as "MANUAL
APPLICATION REQUIRED" has actually been applied to prod. Table
exists, lookup_zone RPC works, only the zone polygons remain
un-imported. BLOCKER.md needs an update.
- CITY_GAPS_UPDATE_APR26.md
Paste-ready edits to remove Detroit + Tampa from CITY_GAPS.md
(verified covered via *_arcgis in PR #9). Documented as a patch
rather than a competing PR because CITY_GAPS.md only lives on
HT's in-progress feat/phase-c-durham-gaps branch.
- RECOLLECT_RESEARCH_APR26.md
ReCollect public API surface is too narrow for new-city imports
without prior knowledge of (place_id, service_id) tuples. All 16
in scripts/import-recollect.mjs already imported. Hard-stop check
for Step 3 documented.
- RECOLLECT_DISCOVERY_PLAN.md
Discovery script HT can run during waking hours to surface ~70-
130 new candidate tuples from the HACS waste-collection-schedule
repo. NOT executed — script is paste-ready, not committed live.
- QUEUE_TRIAGE_APR26.md
43 items in factory/queue/ classified. ~17 stale apr-12/13
sprints, ~6 out-of-scope (Daisy/LIMS/BotCaptcha/etc.), 4
NEEDS-EYES, the rest done/reference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…raft, discovery script, import audit) Bundle of follow-up work flagged in the apr-26 overnight bundle (PR #73): - BLOCKER.md: collection_zones migration is APPLIED in prod, not pending. Reframes the readiness-sprint section accordingly. Also reconciles the Louisville/Pittsburgh contradiction (RESOLVED ↔ DEAD ENDPOINT in the same file) — DEAD is the live state. - migrations/2026XX_normalize_city_slugs.sql.draft: SQL to consolidate duplicate city slugs surfaced by SOURCE_REGISTRY_AUDIT_APR26. Part 1 is the safe ReCollect-placeholder consolidation (austin-tx → austin, etc., 4 rows touched total). Parts 2 & 3 flagged as needing HT decisions before execution. Filename suffix `.sql.draft` keeps it out of CI globs. - scripts/discover-recollect-tuples.mjs: runnable script that scrapes the HACS waste-collection-schedule repo for ~70-130 candidate (place_id, service_id) tuples beyond the 16 already imported. NOT executed — promoted from RECOLLECT_DISCOVERY_PLAN.md as a reviewable file that HT can run during waking hours. - IMPORT_AUDIT_APR27.md: 35 import-*.mjs scripts × claimed-vs-actual data shape. 28 produced data, 5 are empty/blocked, 13 store zone descriptors as synthetic 'address' strings (the structural mismatch the Long Beach PoC tonight is testing a fix for). Zero DB writes in this commit. schedule_reports row count snapshot: 7,300,730 unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-stage recycling-day schema fix directive halted at Stage 1 per the "Case C/D = halt" rule. Findings: 1. Source layer (services6.arcgis.com/yCArG7wGXGyWLqav/.../ Refuse_Collection_Days/FeatureServer/0) has only ONE day field (`DAY` esriFieldTypeString). No sibling service in the Long Beach catalog carries separate recycling-day or yard-waste-day data. 2. The "Wednesday, Saturday" production output that motivated this fix is a parsing bug in src/lib/city-geocode.ts:135-172 (`normalizeDay`), not real recycling-day data. DAY_MAP only has abbreviations; full-word inputs fall through to the multi-day decoder, which greedy-matches abbreviations within the word. WEDNESDAY/THURSDAY/SATURDAY all decode wrong (3 of 7 days). M/T/F/Su decode correctly. Reproduced locally. 3. Long Beach has same-day refuse + recycling per the existing code assumption (recyclingDay: day at line 1507). No separate recycling data exists in any discoverable source for this city. Per directive: do not design schema speculatively for Case D. Stage 2 not executed. No code changes, no migration, no worktree on trashalert-web. collection_zones row count remains 19, schedule_reports remains 7,300,730. Recommendations for HT: Branch 1 (highest leverage): fix normalizeDay bug. Real bug affecting 3 of 7 days for cities returning full-word day values. ~5 line fix. Branch 2: generalize collection_zones schema later, alongside the Tampa import — Tampa has 3 actual separate layers (trash/recycling/ yard-waste per src/lib/city-geocode.ts:1515) and is the driver for the schema generalization, not Long Beach. Branch 3: hunt for separate Long Beach recycling source if HT confirms LB has non-same-day collection (no evidence for this yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three read-only audit docs from the long-running zone-expansion session. Companion to the per-city PRs shipped today (PR #14 Houston, PR #15 Step 0.5 architecture, PR #16 Baltimore). - IMPORT_DRIFT_AUDIT: all 16 ReCollect tuples healthy, 4 entries show city-name drift in the API response (halton-on→burlington, saanich-bc→victoria, hardin-id→boise, king-county-wa→des-moines) but these are naming differences for service regions, not bugs. No action needed. - SLUG_DUPS: 7 duplicate clusters (unchanged from Apr 26 audit). 5 are safe ReCollect-placeholder mismatches; 2 (NYC, Portland) need HT decisions. Migration draft already exists at migrations/2026XX_normalize_city_slugs.sql.draft. - COVERAGE_HEATMAP: 108 cities, sorted by row count, source-label sample, last-updated, zone-priority status. 4 cities now in the collection_zones priority set (long-beach, indianapolis, houston, baltimore — all live in prod after this session's PRs). Zero DB writes from these audits. Pure observation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pivoted overnight Phase A (ReCollect imports) to Step 6 fallback work. 6 read-only audit/research files added. Zero code changes, zero schema changes, zero DB writes.
Files
SOURCE_REGISTRY_AUDIT_APR26.md— 108 distinct city slugs inschedule_reports, slug-duplication clusters, recommendation to create the SOURCE_REGISTRY.md file (it doesn't exist anywhere in either repo).BLOCKER_ANALYSIS_APR26.md—collection_zonesmigration is already applied to prod; only the zone-polygon imports remain. BLOCKER.md is misleading and needs an update.CITY_GAPS_UPDATE_APR26.md— paste-ready edits to remove Detroit + Tampa (verified covered via city ArcGIS in PR Link addresses to pickup zones and days #9). Documented as a patch, not a competing PR, because CITY_GAPS.md is on HT's in-progress branch.RECOLLECT_RESEARCH_APR26.md— public ReCollect API has no enumeration endpoint; the existing 16 tuples inscripts/import-recollect.mjsare all already imported.RECOLLECT_DISCOVERY_PLAN.md— paste-ready script for HT to scrape the HACS waste-collection-schedule repo for ~70-130 new candidate tuples. Not committed live.QUEUE_TRIAGE_APR26.md— 43 queue items classified; ~17 stale, ~6 out-of-scope, 4 NEEDS-EYES.Why merge to the default branch (claude/sample-addresses-...)
That's the actual trunk on this repo (origin/HEAD). Long branch name is incidental.
Why ReCollect imports were skipped
All 16 known (place_id, service_id) tuples are already in DB. Discovering new ones requires either crawling city websites (slow) or partner credentials (not available). Documented in detail in
RECOLLECT_RESEARCH_APR26.md. HT confirmed pivot to Step 6.Test plan
🤖 Generated with Claude Code