Skip to content

Commit

Permalink
Merge branch 'fuzzy-dedup' of github.com:IBM/data-prep-kit into fuzzy…
Browse files Browse the repository at this point in the history
…-dedup
  • Loading branch information
cmadam committed Nov 19, 2024
2 parents fb5601a + ed4e9c1 commit 8de14b5
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion transforms/universal/fdedup/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ shingles.
`num_minhashes_per_band` minhashes. For each document, generate a unique signature for every band.

The values for `num_bands` and `num_minhashes_per_band` determine the likelihood that documents with a certain Jaccard
similarity will be marked as duplicates. A Jupyter notebook in the [utils](utils) folder generates a graph of this
similarity will be marked as duplicates. A Jupyter notebook in the [utils](../utils) folder generates a graph of this
probability function, helping users explore how different settings for `num_bands` and `num_minhashes_per_band` impact
the deduplication process.

Expand Down

0 comments on commit 8de14b5

Please sign in to comment.