Skip to content

feat(#731): delegate title matching to core SimilarityEngine#74

Merged
chubes4 merged 1 commit intomainfrom
feature/731-events-unified-dedup
Mar 8, 2026
Merged

feat(#731): delegate title matching to core SimilarityEngine#74
chubes4 merged 1 commit intomainfrom
feature/731-events-unified-dedup

Conversation

@chubes4
Copy link
Member

@chubes4 chubes4 commented Mar 8, 2026

Summary

Migrates data-machine-events to use the unified SimilarityEngine from core (data-machine#731, data-machine#732) instead of maintaining its own duplicate title normalization and fuzzy matching.

  • EventIdentifierGenerator rewritten as thin wrapper — extractCoreTitle(), titlesMatch(), normalize_dashes(), normalize_text() all delegate to SimilarityEngine. Venue matching stays (domain-specific, not in core).
  • DuplicateDetectionAbilities — removed data-machine-events/titles-match ability (superseded by core's datamachine/titles-match). Added event strategy registration on datamachine_duplicate_strategies filter.
  • EventUpsert and CheckDuplicatesCommand — no changes needed (they call EventIdentifierGenerator, which now delegates).

What Changed

File Change
inc/Utilities/EventIdentifierGenerator.php Thin wrapper → SimilarityEngine for title ops; venue matching stays
inc/Abilities/DuplicateDetectionAbilities.php Removed titles-match ability; added datamachine_duplicate_strategies filter registration
tests/Unit/EventIdentifierGeneratorTest.php Fixed broken test, added venue + delegation tests

Strategy Registration

The events plugin now registers on core's datamachine_duplicate_strategies filter:

add_filter( 'datamachine_duplicate_strategies', function( $strategies ) {
    $strategies[] = [
        'id'        => 'event_venue_date_title',
        'post_type' => 'event',
        'callback'  => [ DuplicateDetectionAbilities::class, 'executeEventStrategy' ],
        'priority'  => 10,
    ];
    return $strategies;
});

When core's datamachine/check-duplicate is called for the event post type with context.venue and context.startDate, this strategy fires first — using venue + date + fuzzy title matching.

Code Removed

  • normalize_dashes() — was identical copy of core's implementation
  • normalize_text() — replaced by SimilarityEngine::normalizeBasic()
  • extractCoreTitle() body — replaced by SimilarityEngine::normalizeTitle()
  • titlesMatch() body — replaced by SimilarityEngine::titlesMatch()
  • data-machine-events/titles-match ability — superseded by datamachine/titles-match

Test Fix

test_rightmost_delimiter_used was renamed to test_earliest_delimiter_used — the old test asserted behavior inconsistent with the actual leftmost-wins algorithm (the test would have failed if CI were running). New test correctly asserts that ': ' at position 8 wins over ' - ' at position 20.

Dependency

Requires data-machine >= 0.39.0 (SimilarityEngine + DuplicateCheckAbility from PR #732).

Relates to Extra-Chill/data-machine#731

Migrates data-machine-events to use the unified SimilarityEngine from
core (Extra-Chill/data-machine#731) instead of maintaining its own
duplicate title normalization and fuzzy matching.

Changed:
- EventIdentifierGenerator: extractCoreTitle(), titlesMatch(),
  normalize_dashes(), normalize_text() replaced with thin wrappers
  delegating to SimilarityEngine. Venue matching stays (domain-specific).
- DuplicateDetectionAbilities: removed data-machine-events/titles-match
  ability (superseded by core datamachine/titles-match). Added event
  strategy registration on datamachine_duplicate_strategies filter so
  core check-duplicate ability can find event duplicates.
- EventIdentifierGeneratorTest: fixed test_rightmost_delimiter_used
  (was testing nonexistent behavior), added venue matching tests and
  SimilarityEngine delegation verification test.

Requires: data-machine >= 0.39.0 (SimilarityEngine + DuplicateCheckAbility)
@chubes4 chubes4 merged commit 8dfc4c1 into main Mar 8, 2026
1 of 2 checks passed
@chubes4 chubes4 deleted the feature/731-events-unified-dedup branch March 8, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant