Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility of mixscape #671

Open
jesswhitts opened this issue Oct 16, 2024 · 7 comments · May be fixed by #683
Open

Reproducibility of mixscape #671

jesswhitts opened this issue Oct 16, 2024 · 7 comments · May be fixed by #683
Assignees
Labels
bug Something isn't working

Comments

@jesswhitts
Copy link

Report

Hello,

When running the lda visualisation plot as part of the mixscape workflow, I am not getting consistent results between runs, the plot looks slightly different (although very similar).
Is a random seed used at any point, and is there a way to set this so I can get reproducible results?

Code used:

ms = pt.tl.Mixscape()
ms.perturbation_signature(mdata["rna"], pert_key="perturbation", control="NT")

adata_pert = mdata["rna"].copy()
adata_pert.X = adata_pert.layers["X_pert"]

sc.pp.pca(adata_pert)
sc.pp.neighbors(adata_pert, metric="cosine")
sc.tl.umap(adata_pert)

ms.mixscape(adata=mdata["rna"], control="NT", labels="guide_allocation", layer="X_pert",
logfc_threshold = 0.15, iter_num=100, min_de_genes=5)
ms.lda(adata=mdata["rna"], control="NT", labels="guide_allocation", layer="X_pert")
ms.plot_lda(adata=mdata["rna"], control="NT")

Version information


adjustText 1.2.0
anndata 0.10.8
matplotlib 3.9.2
mudata 0.3.1
muon 0.1.6
numpy 1.26.4
pandas 2.2.2
pertpy 0.9.4
scanpy 1.10.3
session_info 1.0.0

PIL 10.4.0
absl NA
appnope 0.1.4
asttokens NA
attr 24.2.0
blitzgsea NA
certifi 2024.08.30
charset_normalizer 3.3.2
chex 0.1.86
comm 0.2.2
contextlib2 NA
cycler 0.12.1
cython_runtime NA
dateutil 2.9.0
debugpy 1.8.5
decorator 5.1.1
decoupler 1.8.0
docrep 0.3.2
equinox 0.11.7
etils 1.9.4
executing 2.1.0
filelock 3.16.1
flax 0.9.0
fsspec 2024.9.0
h5py 3.11.0
idna 3.10
igraph 0.11.6
ipykernel 6.29.5
jax 0.4.33
jaxlib 0.4.33
jaxopt NA
jaxtyping 0.2.34
jedi 0.19.1
joblib 1.4.2
kiwisolver 1.4.7
lamin_utils 0.13.4
legacy_api_wrap NA
leidenalg 0.10.2
lightning 2.4.0
lightning_fabric 2.4.0
lightning_utilities 0.11.7
lineax 0.0.5
llvmlite 0.43.0
matplotlib_inline 0.1.7
ml_collections NA
ml_dtypes 0.5.0
mpl_toolkits NA
mpmath 1.3.0
msgpack 1.1.0
multipledispatch 0.6.0
natsort 8.4.0
numba 0.60.0
numpyro 0.15.3
opt_einsum v3.3.0
optax 0.2.3
ott 0.4.8
packaging 24.1
parso 0.8.4
patsy 0.5.6
pickleshare 0.7.5
platformdirs 4.3.6
ply 3.11
prompt_toolkit 3.0.47
psutil 6.0.0
pubchempy 1.0.4
pure_eval 0.2.3
pyarrow 17.0.0
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.18.0
pynndescent 0.5.13
pyomo 6.8.0
pyparsing 3.1.4
pyro 1.9.1
pytorch_lightning 2.4.0
pytz 2024.2
requests 2.32.3
rich NA
scipy 1.14.1
scvi 1.1.6.post2
seaborn 0.13.2
six 1.16.0
sklearn 1.5.2
skmisc 0.5.1
sparsecca 0.3.1
stack_data 0.6.2
statsmodels 0.14.3
sympy 1.13.3
texttable 1.7.0
threadpoolctl 3.5.0
toolz 0.12.1
torch 2.4.1
torchgen NA
torchmetrics 1.4.2
tornado 6.4.1
tqdm 4.66.5
traitlets 5.14.3
typeguard NA
typing_extensions NA
umap 0.5.6
urllib3 2.2.3
wcwidth 0.2.13
yaml 6.0.2
zmq 26.2.0

IPython 8.27.0
jupyter_client 8.6.3
jupyter_core 5.7.2

Python 3.12.6 | packaged by conda-forge | (main, Sep 11 2024, 04:55:15) [Clang 17.0.6 ]
macOS-13.0.1-arm64-arm-64bit

@jesswhitts jesswhitts added the bug Something isn't working label Oct 16, 2024
@Zethson Zethson self-assigned this Oct 17, 2024
@Zethson
Copy link
Member

Zethson commented Oct 17, 2024

Dear @jesswhitts

I suspect it might be the LDA. I'll have a look soon.

@jesswhitts
Copy link
Author

Hi @Zethson - any update on this?

@Zethson
Copy link
Member

Zethson commented Oct 28, 2024

Sorry, I might have time for this on Friday. If it's urgent for you, you can check all function calls individually whether you always get the same results or not. Then you can tell me where the differences come from and I can try to get rid of them.

I'm very busy at the moment.

@jesswhitts
Copy link
Author

Hi @Zethson

I think the issue is with the mixscape function itself, not the LDA. Across 4 runs there are a few cells which are classified differently.
Does the random_state parameter need setting in GaussianMixture maybe?
I tried setting a global numpy random state, but this didn't work...

@Zethson
Copy link
Member

Zethson commented Nov 11, 2024

Thanks! @Lilly-May will have a look by the end of the week. Apologies again for taking so long!

@scverse scverse deleted a comment from albacorp Nov 13, 2024
@jesswhitts
Copy link
Author

jesswhitts commented Nov 18, 2024

Hi @Lilly-May,

Have done some of my own testing, and think that setting the GaussianMixture random_state does indeed fix this issue, I amended the following parts of the mixscape function accordingly:

def mixscape(
    self,
    adata: AnnData,
    labels: str,
    control: str,
    new_class_name: str | None = "mixscape_class",
    min_de_genes: int | None = 5,
    layer: str | None = None,
    logfc_threshold: float | None = 0.25,
    iter_num: int | None = 10,
    random_init: int | None = 1,
    split_by: str | None = None,
    pval_cutoff: float | None = 5e-2,
    perturbation_type: str | None = "KO",
    copy: bool | None = False,
):

mm = GaussianMixture(
                        n_components=2,
                        random_state=random_init,
                        covariance_type="spherical",
                        means_init=means_init,
                        precisions_init=precisions_init,
                    ).fit(np.asarray(pvec).reshape(-1, 1))

I hope these changes are appropriate :)

Best,
Jess

@Lilly-May Lilly-May linked a pull request Dec 9, 2024 that will close this issue
3 tasks
@Lilly-May
Copy link
Collaborator

Hi @jesswhitts! Thanks so much for looking into this, and sorry for not getting to it earlier. I tested your suggestion and found that it indeed ensures consistent results across multiple runs. I've created PR #683, so this should be incorporated into pertpy soon. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants