-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Challenges in Using Panpipes for Single-Cell Multiomics Data Analysis #325
Comments
hi @liuchuan111 thanks for this thorough description of the issue you faced, I would be happy to provide support and understand how to use panpipes in your case. you mention having multiple samples for rna and for atac. can you clarify if the rna and the atac have the same cell barcodes? is this a multiome experiment or two libraries from the same sample? if the latter is true, why would you ingest them together? finally, did you try ingesting the atac alone and if so can you explain what issues you found when reading the atac data from CellRanger? (https://panpipes-tutorials.readthedocs.io/en/latest/ingesting_multiome/ingesting_mome.html) thank you |
Thank you for your prompt response. I have followed your suggestion to aggregate multiple samples using the "cellranger-atac aggr" command and subsequently ingested the ATAC-seq data alone. However, we encountered a KeyError: 'rna_path', which seems to indicate that there may be incorrectly configured parameters in the pipeline.yml file. Unfortunately, we are unsure which part needs to be modified. Below are the contents of the submission_atac_file.txt and the main sections of the pipeline.yml file for your reference. #-------------------------pipeline.yml----------------------------------- #-------------------------- #------------------------------------------ #-----------------------------
|
thanks for detailing the error. can you send me the pipeline.yml and the pipeline.log you find in the project directory? also, in case of errors or failed pipeline runs, we recommend starting from a clean directory, so pipeline will not attempt to re-start from previous jobs. the pipeline scans the folder for output files from previous tasks and try to complete them, so if you re-run the new aggregated ATAC in the same folder you had before, it may have detected the rna output files and attempt to complete their tasks. https://panpipes-pipelines.readthedocs.io/en/latest/usage/troubleshooting.html |
Using Panpipes for single-cell omics data analysis presents several challenges, making it difficult for users to effectively apply the tool.
Tutorials and Examples: While the documentation states that Panpipes can ingest various file types, such as outs/filtered_feature_bc_matrix.h5 (from CellRanger) or AnnData h5ad objects (one per sample), the tutorials lack clarity on how to generate the h5ad objects for different modalities. Moreover, the provided examples mainly demonstrate single-sample analysis, with no guidance on analyzing multiple replicate samples from different groups, which would be more practical for most use cases.
Handling .yaml Files: Modifying the .yaml configuration file is also challenging due to the absence of clear instructions on which fields are mandatory or optional when working with scRNA-seq and/or ATAC-seq data.
Given these limitations, we encountered various error messages during the data ingest process. For example, we have two groups of samples, each containing 4 replicates (A1–A4 and B1–B4), and all samples are divided into two parts for scRNA-seq and ATAC-seq separately. It is unclear which example or workflow we should follow for such a setup. Should we separately ingest these two modalities or ingest all data in one submission file?
Additionally, while the ingestion of outs/filtered_feature_bc_matrix.h5 for scRNA-seq data works fine, Panpipes fails to read the filtered_peak_bc_matrix output of ATAC-seq data generated by CellRanger. The error indicates the absence of a features.tsv file in this directory, as it contains peaks.bed instead. To address this, we merged the filtered_peak_bc_matrix.h5 files from these samples into a single atac.h5ad file. Finally, we generated two .h5ad files: one for ATAC (atac.h5ad) and one for RNA (rna.h5ad), each containing the eight samples. However, the ingest process of these two .h5ad filepath in one sample_file_qc.txt still resulted in the following error:
"ERROR \
Exception #1 \
'builtins.OSError(--------------------------------------- \
Child was terminated by signal -1: \
The stderr was: \
"lib/python3.10/site-packages/docrep/decorators.py:43: SyntaxWarning: 'pa
ram_categorical_covariate_keys' is not a valid key! \
doc = func(self, args[0].doc, *args[1:], **kwargs) \
computing score 'MarkersNeutro_score' \
WARNING: genes are not in var_names and ignored: Index(['ANXA1', 'ARG1', 'BPI', 'CD101', 'CD24', 'CD274', 'CSF3R', 'CXCL8', \
I am curious if anyone has successfully used Panpipes for analyzing their own single-cell multiomics data under similar conditions. If so, sharing insights or additional resources would be highly appreciated!
The text was updated successfully, but these errors were encountered: