Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Principal Component Analysis #9610

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kausthub-kannan
Copy link
Contributor

Describe your change:

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".

@algorithms-keeper algorithms-keeper bot added require descriptive names This PR needs descriptive function and/or variable names require tests Tests [doctest/unittest/pytest] are required labels Oct 3, 2023
Copy link

@algorithms-keeper algorithms-keeper bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

@algorithms-keeper algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Oct 3, 2023
@algorithms-keeper algorithms-keeper bot removed the require tests Tests [doctest/unittest/pytest] are required label Oct 3, 2023
Copy link

@algorithms-keeper algorithms-keeper bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

@algorithms-keeper algorithms-keeper bot added tests are failing Do not merge until tests pass and removed require descriptive names This PR needs descriptive function and/or variable names labels Oct 3, 2023
@algorithms-keeper algorithms-keeper bot added the require tests Tests [doctest/unittest/pytest] are required label Oct 3, 2023
Copy link

@algorithms-keeper algorithms-keeper bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

@algorithms-keeper algorithms-keeper bot removed require tests Tests [doctest/unittest/pytest] are required tests are failing Do not merge until tests pass labels Oct 3, 2023
Copy link
Contributor

@tianyizheng02 tianyizheng02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpicks, but otherwise LGTM

Comment on lines +1 to +14
"""
Principal Component Analysis (PCA) is an unsupervised learning
algorithm that is used for the dimensionality reduction in machine
learning. It is a statistical procedure that uses an orthogonal
transformation to convert a set of observations of possibly correlated
variables into a set of values of linearly uncorrelated variables called
principal components.

Data: The data used for PCA is a set of 500 data points, each with 4
features. The data is assumed to be in normal form.

Reference: https://en.wikipedia.org/wiki/Principal_component_analysis

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Principal Component Analysis (PCA) is an unsupervised learning
algorithm that is used for the dimensionality reduction in machine
learning. It is a statistical procedure that uses an orthogonal
transformation to convert a set of observations of possibly correlated
variables into a set of values of linearly uncorrelated variables called
principal components.
Data: The data used for PCA is a set of 500 data points, each with 4
features. The data is assumed to be in normal form.
Reference: https://en.wikipedia.org/wiki/Principal_component_analysis
"""
"""
Principal Component Analysis (PCA) is an unsupervised learning
algorithm that is used for the dimensionality reduction in machine
learning. It is a statistical procedure that uses an orthogonal
transformation to convert a set of observations of possibly correlated
variables into a set of values of linearly uncorrelated variables called
principal components.
Data: The data used for PCA is a set of 500 data points, each with 4
features. The data is assumed to be in normal form.
Reference: https://en.wikipedia.org/wiki/Principal_component_analysis
"""

Comment on lines +80 to +81
vector = vector - self.mean
return np.dot(vector, np.transpose(self.components))
Copy link
Contributor

@tianyizheng02 tianyizheng02 Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vector = vector - self.mean
return np.dot(vector, np.transpose(self.components))
vector -= self.mean
return np.dot(vector, np.transpose(self.components))

>>> test_pca.fit(test_data)
"""
self.mean = np.mean(vector, axis=0)
vector = vector - self.mean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vector = vector - self.mean
vector -= self.mean

Comment on lines +54 to +60
eigen_vector, eigen_value = np.linalg.eig(cov)
eigen_vector = eigen_vector.T

indexes = np.argsort(eigen_value)[::-1]
eigen_vector = eigen_vector[indexes]

self.components = eigen_vector[: self.n]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
eigen_vector, eigen_value = np.linalg.eig(cov)
eigen_vector = eigen_vector.T
indexes = np.argsort(eigen_value)[::-1]
eigen_vector = eigen_vector[indexes]
self.components = eigen_vector[: self.n]
eigenvector, eigenvalue = np.linalg.eig(cov)
eigenvector = eigenvector.T
indexes = np.argsort(eigenvalue)[::-1]
eigenvector = eigenvector[indexes]
self.components = eigenvector[:self.n]

Nitpick: "eigenvector" and "eigenvalue" are each one word (no spaces)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting reviews This PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants