Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to formulaic-contrasts #682

Merged
merged 13 commits into from
Jan 4, 2025
Merged

Switch to formulaic-contrasts #682

merged 13 commits into from
Jan 4, 2025

Conversation

grst
Copy link
Collaborator

@grst grst commented Dec 1, 2024

PR Checklist

  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes

Technical details

Additional context

Close #610

@Zethson
Copy link
Member

Zethson commented Dec 2, 2024

@grst I'll fix the unrelated tests soon. Sorry about them + the currently uber slow CI.

@grst grst marked this pull request as ready for review December 26, 2024 12:23
@grst grst requested review from Zethson and emdann December 26, 2024 12:25
@grst
Copy link
Collaborator Author

grst commented Dec 26, 2024

What's left is to update the differential expression tutorial to use a numeric contrast vector instead of a tuple of strings, i.e.

-res_df = pds2.test_contrasts(["Treatment", "Chemo", "Anti-PD-L1+Chemo"])
+res_df = pds2.test_contrasts(pds2.contrast(column="Treatment", baseline="Chemo", group_to_compare="Anti-PD-L1+Chemo"))

I tried a bit, but rerunning the tutorial with edgeR and rpy2 is a huge rabbit hole.

Copy link
Member

@emdann emdann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've pushed some edits to pass the rpy2/edgeR tests #692

The main thing to work on seems to be the documentation, e.g. is this still accurate?

def test_contrasts(self, contrasts, **kwargs):
"""
Perform a comparison as specified in a contrast vector.
Args:
contrasts: Either a numeric contrast vector, or a dictionary of numeric contrast vectors.
**kwargs: passed to the respective implementation.
Returns:
A dataframe with the results.
"""

Also typing in base functions like test_contrasts and compare_groups would be helpful.

@emdann
Copy link
Member

emdann commented Jan 3, 2025

I'm gonna have a go at editing the tutorial.

@emdann
Copy link
Member

emdann commented Jan 4, 2025

One small note: I noticed that the model doesn't complain if you try to specify a complex interaction contrast on a model that wasn't fit with the interaction in the design, but just throws nonsense results.

Following example from the tutorial:

# Exclude patient with progressive disease, or not full rank for interaction
pdata2 = pdata[pdata.obs['Efficacy'] != 'PD'].copy()

# Bad design definition without interaction
pds2 = pt.tl.PyDESeq2(adata=pdata2, design="~ Efficacy + Treatment")
pds2.fit()

interaction_contrast = (
    pds2.cond(Treatment="Chemo", Efficacy="PR") - pds2.cond(Treatment="Chemo", Efficacy="SD")
) - (
    pds2.cond(Treatment="Anti-PD-L1+Chemo", Efficacy="PR") - pds2.cond(Treatment="Anti-PD-L1+Chemo", Efficacy="SD")
)
res_df = pds2.test_contrasts(contrasts=interaction_contrast)

No complaint, but the results are broken:

Log2 fold change & Wald test p-value, contrast vector: [0. 0. 0.]
           baseMean  log2FoldChange  lfcSE  stat  pvalue  padj
A1BG      16.408605             0.0    0.0   NaN     NaN   NaN
A1BG-AS1   1.958737             0.0    0.0   NaN     NaN   NaN
A1CF       0.002053             0.0    0.0   NaN     NaN   NaN
A2M       30.296881             0.0    0.0   NaN     NaN   NaN
A2M-AS1    0.557092             0.0    0.0   NaN     NaN   NaN
...             ...             ...    ...   ...     ...   ...
ZXDC       6.114098             0.0    0.0   NaN     NaN   NaN
ZYG11A     0.093600             0.0    0.0   NaN     NaN   NaN
ZYG11B     3.404941             0.0    0.0   NaN     NaN   NaN
ZYX       77.175203             0.0    0.0   NaN     NaN   NaN
ZZEF1      9.752162             0.0    0.0   NaN     NaN   NaN

While results make sense if I specify the design properly pds2 = pt.tl.PyDESeq2(adata=pdata2, design="~ Efficacy + Treatment + Efficacy*Treatment").

Do we want this to happen? Can we add an informative error if the contrast vector is all zeros?

emdann and others added 4 commits January 4, 2025 09:54
* fix broken rpy2 edger tests

* updated edger tests
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Copy link
Member

@Zethson Zethson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3 looks great!

pertpy/tools/_differential_gene_expression/_edger.py Outdated Show resolved Hide resolved
Signed-off-by: zethson <[email protected]>
@github-actions github-actions bot added the chore label Jan 4, 2025
@Zethson
Copy link
Member

Zethson commented Jan 4, 2025

@emdann I added type hints as you requested above and updated the submodule to have your tutorial changes as well. With passing CI (I work on it), I am happy to merge this now.

@codecov-commenter
Copy link

codecov-commenter commented Jan 4, 2025

Codecov Report

Attention: Patch coverage is 86.84211% with 5 lines in your changes missing coverage. Please review.

Project coverage is 64.85%. Comparing base (9bba130) to head (a312bd0).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...ertpy/tools/_differential_gene_expression/_base.py 80.00% 3 Missing ⚠️
...y/tools/_differential_gene_expression/_pydeseq2.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #682      +/-   ##
==========================================
- Coverage   65.56%   64.85%   -0.72%     
==========================================
  Files          47       46       -1     
  Lines        6105     5992     -113     
==========================================
- Hits         4003     3886     -117     
- Misses       2102     2106       +4     
Files with missing lines Coverage Δ
...py/tools/_differential_gene_expression/__init__.py 100.00% <100.00%> (ø)
...rtpy/tools/_differential_gene_expression/_edger.py 84.84% <100.00%> (-2.25%) ⬇️
...ools/_differential_gene_expression/_statsmodels.py 100.00% <ø> (ø)
pertpy/tools/_distances/_distances.py 89.96% <100.00%> (-0.07%) ⬇️
pertpy/tools/_milo.py 61.75% <100.00%> (ø)
pertpy/tools/_mixscape.py 79.12% <100.00%> (ø)
...y/tools/_differential_gene_expression/_pydeseq2.py 92.30% <75.00%> (-1.03%) ⬇️
...ertpy/tools/_differential_gene_expression/_base.py 24.84% <80.00%> (-7.49%) ⬇️

@Zethson
Copy link
Member

Zethson commented Jan 4, 2025

Merging now. Making a release as well. @emdann I'll create an issue for your concern above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend PyDESeq2 class to support arbitrary contrasts and designs
4 participants