Skip to content

refactor: deduplicate shared utilities across web scraper extractors#100

Merged
chubes4 merged 1 commit intomainfrom
refactor/scraper-deduplicate-base-utilities
Mar 16, 2026
Merged

refactor: deduplicate shared utilities across web scraper extractors#100
chubes4 merged 1 commit intomainfrom
refactor/scraper-deduplicate-base-utilities

Conversation

@chubes4
Copy link
Member

@chubes4 chubes4 commented Mar 16, 2026

Summary

  • Move duplicated normalizeTime(), inferDate(), mergePageVenueData() into BaseExtractor shared methods
  • Add loadDom() helper to eliminate repeated 5-line DOM bootstrap boilerplate across 7+ extractors
  • Add fetchUrl() helper with logging for centralized HTTP fetch pattern
  • Add resolveUrl() helper for relative URL resolution
  • Fix Squarespace internal duplication: merge extractDateFromText() and parseTextDate() into one method using inferDateFromMonthDay()
  • Replace copy-pasted implementations in RedRocks, Freshtix, RHP, Craftpeak, MusicItem, Microdata, Timely extractors

Why

The universal web scraper has 24 extractors across ~9,600 lines. Multiple extractors had identical implementations of time normalization, year inference, venue merging, DOM loading, and URL resolution. This consolidation makes the codebase more maintainable and reduces the surface area for bugs as we add new extractors and scale venue coverage.

Net change

+191 lines (shared utilities), -171 lines (duplicated code) = net +20 lines

@github-actions
Copy link

github-actions bot commented Mar 16, 2026

Homeboy Results — data-machine-events

Pull Request

⚡ Scope: changed files only

lint (changed files only)

test (changed files only)

audit (changed files only)

Tooling versions
  • Homeboy CLI: homeboy 0.78.0+2a48277
  • Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
  • Extension revision: unknown
  • Action: Extra-Chill/homeboy-action@v1

Homeboy Action v1

@chubes4 chubes4 merged commit ea7f58c into main Mar 16, 2026
2 checks passed
@chubes4 chubes4 deleted the refactor/scraper-deduplicate-base-utilities branch March 16, 2026 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant