Skip to content

Conversation

Dhruv127
Copy link
Contributor

No description provided.

@Dhruv127 Dhruv127 changed the title Create SimpleImputer Fix 13 : Create SimpleImputer Dec 11, 2023
Copy link
Contributor

@jkfitzsimons jkfitzsimons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a method that is a drop in replacement of SimpleImputer in sklearn with similar interface. It looks like you accidentally submitted rough notes?

@Dhruv127
Copy link
Contributor Author

sorry for this issue , please check now .

Copy link
Contributor

@jkfitzsimons jkfitzsimons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using the numpy methods here and should be using the DP versions from this package - should be a small change


def _impute_mean(self, col, missing_col):
non_missing_values = col[~missing_col]
col_mean = np.nanmean(non_missing_values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are using nanmean from numpy... shouldn't you use the dp version from this library ;)


def _impute_median(self, col, missing_col):
non_missing_values = col[~missing_col]
col_median = np.nanmedian(non_missing_values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are using nanmedian from numpy... shouldn't you use the dp version from this library ;)

def _impute_median(self, col, missing_col):
non_missing_values = col[~missing_col]
col_median = np.nanmedian(non_missing_values)
sensitivity = np.nanmax(np.abs(non_missing_values - col_median))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are using nanmax from numpy... shouldn't you use the dp version from this library ;)

elif self.strategy == 'constant':
self.statistics_ = [self.fill_value] * X.shape[1]
else:
self.statistics_ = [np.nanmean(col) if np.issubdtype(col.dtype, np.number) else np.nan for col in X.T]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are using nanmean from numpy... shouldn't you use the dp version from this library ;)

@Dhruv127
Copy link
Contributor Author

Sorry bhaiya , i think there is no nanmax and nanmedium functions in this lib . Can you please check it once .
However i can try to make them , in another issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants