Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy dedup #699

Merged
merged 100 commits into from
Nov 26, 2024
Merged

Fuzzy dedup #699

merged 100 commits into from
Nov 26, 2024

Conversation

Kibnelson
Copy link
Collaborator

Why are these changes needed?

Provide fuzzy dedup implementation for Python, Spark and Ray

Related issue number (if any).

#152
#79

blublinsky and others added 30 commits October 10, 2024 19:05
Signed-off-by: Constantin M Adam <[email protected]>
Signed-off-by: Constantin M Adam <[email protected]>
Signed-off-by: Constantin M Adam <[email protected]>
Signed-off-by: Constantin M Adam <[email protected]>
Signed-off-by: Constantin M Adam <[email protected]>
@touma-I touma-I self-requested a review November 15, 2024 19:01
utils folder is one level up from the python folder
@shahrokhDaijavad
Copy link
Member

@cmadam The documentation looks really good! Thank you. I found one wrong link to the utils folder, and I changed it manually (without a change request). Sorry that it meant a full ci/cd test! If I find other problems, I will make a change request and won't change myself.

@touma-I touma-I self-requested a review November 19, 2024 01:59
Copy link
Member

@shahrokhDaijavad shahrokhDaijavad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the three README files.

@shahrokhDaijavad
Copy link
Member

@cmadam Thank you for the notebook. Looks good. Two comments:

  1. Your notebook is using the Ray runtime. To be consistent with other notebooks, can you please change it to use the Python runtime? (now that we have it because of your work on this PR)
  2. In the cell before setting up the parameters, can you briefly (in a few sentences) describe what some of the key parameters are?

@cmadam
Copy link
Collaborator

cmadam commented Nov 25, 2024

@cmadam Thank you for the notebook. Looks good. Two comments:

  1. Your notebook is using the Ray runtime. To be consistent with other notebooks, can you please change it to use the Python runtime? (now that we have it because of your work on this PR)
  2. In the cell before setting up the parameters, can you briefly (in a few sentences) describe what some of the key parameters are?

@shahrokhDaijavad : I am going to add documentation of the parameters that are used in the example notebook. Regarding the runtime, let me double-check with @touma-I, as my understanding was that he asked me to provide a notebook based on the ray runtime, not python.

@shahrokhDaijavad
Copy link
Member

Thanks, @cmadam. Sure. Maybe @touma-I wants this one with ray runtime as an example of how we can use ray.

@shahrokhDaijavad
Copy link
Member

Wow! @cmadam This is outstanding! Three notebooks for the three runtimes.

@touma-I touma-I merged commit 2f80d9c into dev Nov 26, 2024
179 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants