Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenges in Using Panpipes for Single-Cell Multiomics Data Analysis #325

Open
liuchuan111 opened this issue Jan 9, 2025 · 3 comments
Open

Comments

@liuchuan111
Copy link

liuchuan111 commented Jan 9, 2025

Using Panpipes for single-cell omics data analysis presents several challenges, making it difficult for users to effectively apply the tool.

Tutorials and Examples: While the documentation states that Panpipes can ingest various file types, such as outs/filtered_feature_bc_matrix.h5 (from CellRanger) or AnnData h5ad objects (one per sample), the tutorials lack clarity on how to generate the h5ad objects for different modalities. Moreover, the provided examples mainly demonstrate single-sample analysis, with no guidance on analyzing multiple replicate samples from different groups, which would be more practical for most use cases.

Handling .yaml Files: Modifying the .yaml configuration file is also challenging due to the absence of clear instructions on which fields are mandatory or optional when working with scRNA-seq and/or ATAC-seq data.

Given these limitations, we encountered various error messages during the data ingest process. For example, we have two groups of samples, each containing 4 replicates (A1–A4 and B1–B4), and all samples are divided into two parts for scRNA-seq and ATAC-seq separately. It is unclear which example or workflow we should follow for such a setup. Should we separately ingest these two modalities or ingest all data in one submission file?

Additionally, while the ingestion of outs/filtered_feature_bc_matrix.h5 for scRNA-seq data works fine, Panpipes fails to read the filtered_peak_bc_matrix output of ATAC-seq data generated by CellRanger. The error indicates the absence of a features.tsv file in this directory, as it contains peaks.bed instead. To address this, we merged the filtered_peak_bc_matrix.h5 files from these samples into a single atac.h5ad file. Finally, we generated two .h5ad files: one for ATAC (atac.h5ad) and one for RNA (rna.h5ad), each containing the eight samples. However, the ingest process of these two .h5ad filepath in one sample_file_qc.txt still resulted in the following error:

"ERROR \

Exception #1 \

'builtins.OSError(--------------------------------------- \

Child was terminated by signal -1: \

The stderr was: \

"lib/python3.10/site-packages/docrep/decorators.py:43: SyntaxWarning: 'pa
ram_categorical_covariate_keys' is not a valid key! \

doc = func(self, args[0].doc, *args[1:], **kwargs) \

computing score 'MarkersNeutro_score' \

WARNING: genes are not in var_names and ignored: Index(['ANXA1', 'ARG1', 'BPI', 'CD101', 'CD24', 'CD274', 'CSF3R', 'CXCL8', \

I am curious if anyone has successfully used Panpipes for analyzing their own single-cell multiomics data under similar conditions. If so, sharing insights or additional resources would be highly appreciated!

@bio-la
Copy link
Collaborator

bio-la commented Jan 9, 2025

hi @liuchuan111 thanks for this thorough description of the issue you faced, I would be happy to provide support and understand how to use panpipes in your case.

you mention having multiple samples for rna and for atac.
while reading multiple samples and concatenating them into a single object is not a problem for rna, the atac data needs to undergo additional processing to ensure that the peak calling is comparable across samples. that is why we recommend aggregating the atac with cellranger before ingesting it (https://panpipes-pipelines.readthedocs.io/en/latest/usage/setup_for_qc_mm.html#additional-input-files-when-processing-atac-data)

can you clarify if the rna and the atac have the same cell barcodes? is this a multiome experiment or two libraries from the same sample? if the latter is true, why would you ingest them together?

finally, did you try ingesting the atac alone and if so can you explain what issues you found when reading the atac data from CellRanger? (https://panpipes-tutorials.readthedocs.io/en/latest/ingesting_multiome/ingesting_mome.html)

thank you

@liuchuan111
Copy link
Author

liuchuan111 commented Jan 12, 2025

Thank you for your prompt response. I have followed your suggestion to aggregate multiple samples using the "cellranger-atac aggr" command and subsequently ingested the ATAC-seq data alone. However, we encountered a KeyError: 'rna_path', which seems to indicate that there may be incorrectly configured parameters in the pipeline.yml file. Unfortunately, we are unsure which part needs to be modified.

Below are the contents of the submission_atac_file.txt and the main sections of the pipeline.yml file for your reference.
#-------------------submission_atac_file.txt------------------------------
sample_id atac_path atac_filetype fragments_file per_barcode_metrics_file peak_annotation_file
A_vs_B aggr/outs/filtered_peak_bc_matrix.h5 10X_h5 aggr/outs/fragments.tsv.gz aggr/outs/singlecell.csv aggr/outs/peak_annotation.tsv

#-------------------------pipeline.yml-----------------------------------
#Project name and data format
project: "A_vs_B"
sample_prefix: "AvsB"
use_existing_h5mu: False
submission_file: submission_atac_file.txt
metadatacols:
concat_join_type:

#--------------------------
#Modalities in the project
modalities:
rna: False
prot: False
bcr: False
tcr: False
atac: True

#------------------------------------------
#Loading Protein data - additional options
protein_metadata_table:
index_col_choice:
load_prot_from_raw: False
subset_prot_barcodes_to_rna: False

#-----------------------------
#Quality Control (QC) options

#-----------------------------------
#Processing of 10X cellranger metrics files
plot_10X_metrics: False

#----------------------------------
#Doublet detection on RNA modality
scr:
run: False
expected_doublet_rate: 0.06
sim_doublet_ratio: 2
n_neighbours: 20
min_counts: 2
min_cells: 3
min_gene_variability_pctl: 85
n_prin_comps: 30
use_thr: True
call_doublets_thr: 0.25

Here are the details of the KeyError: 'rna_path':
#----------------------KeyError------------------------
Traceback (most recent call last):
File "/home/liuc/miniconda3/envs/singlepipe/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
return self._engine.get_loc(casted_key)
File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'rna_path'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/liuc/miniconda3/envs/singlepipe/bin/panpipes", line 8, in
sys.exit(entry.main())
File "/home/liuc/pipline/git/panpipes/panpipes/entry.py", line 52, in main
module.main(sys.argv)
..............similar descriptions..........................

We greatly appreciate your assistance and look forward to your guidance.

@bio-la
Copy link
Collaborator

bio-la commented Jan 15, 2025

thanks for detailing the error. can you send me the pipeline.yml and the pipeline.log you find in the project directory?

also, in case of errors or failed pipeline runs, we recommend starting from a clean directory, so pipeline will not attempt to re-start from previous jobs. the pipeline scans the folder for output files from previous tasks and try to complete them, so if you re-run the new aggregated ATAC in the same folder you had before, it may have detected the rna output files and attempt to complete their tasks. https://panpipes-pipelines.readthedocs.io/en/latest/usage/troubleshooting.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants