Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align Mixscape with Seurat’s implementation #710

Merged
merged 13 commits into from
Feb 21, 2025

Conversation

Lilly-May
Copy link
Collaborator

PR Checklist

  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes

I made the following updates to pt.tl.Mixscape():

  • Added a de_layer parameter to mixscape(), since DEG computation should be based on adata.X, while the rest of the method operates on adata.layers[X_pert], i.e., the perturbation signature. Seurat’s implementation also includes this parameter (see here).
  • Added a test_method parameter to mixscape() and lda() to specify the test used for DEG computation. Seurat uses Wilcoxon by default (see here), while pertpy previously always used a t-test. Hence, users can now choose their preferred method.
  • Added a scale parameter to mixscape. By default, Seurat scales DEG expression within the respective group (see here), so I introduced an option to enable this in pertpy too, which is set to True by default.
  • Fixed an issue in the loop that assigns cells to NP and KO. Previously, the loop always used the original labels at the beginning of each iteration instead of updating them based on the previous iteration’s results. Now, it correctly updates the labels until convergence. This was also mentioned in issue Mixscape classification #688.
  • Implemented a CustomGaussianMixture model. mixscape() fits a Gaussian Mixture Model to perturbed and non-perturbed cells, which is then used to assign cells to NP or KO. However, Seurat’s model fixes the mean and standard deviation for the NT distributions (see here), whereas Scikit-learn’s GaussianMixture does not support this. Hence, so far, our implementation effectively fit two distributions instead of only one as in Seurat's case. To address this, I created a CustomGaussianMixture class that inherits from GaussianMixture and overrides the M-step of the EM algorithm, allowing to fix certain mean and/or covariance values.
  • Updated Gaussian Mixture Model initialization to align with Seurat’s approach. Seurat’s model allows specifying initial standard deviation values, while Scikit-learn’s implementation specifies precision (inverse of variance). I adjusted our initialization so that we now have the same behavior as in Seurat.

@github-actions github-actions bot added the bug Something isn't working label Feb 14, 2025
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many great improvements! Thank you so much

@Lilly-May Lilly-May marked this pull request as ready for review February 21, 2025 08:50
@Lilly-May Lilly-May merged commit baf9bb2 into main Feb 21, 2025
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants