CellRangerIDE

Hey everybody!

In this section you find explanation about the script that runs CellRanger pipelines such as: converting BCL to fastqs files, alignment with\without multiple samples, various options inside the pipeline like: include introns or not in the alignment, read 3’ or 5’ and so on.

Location: Currently the CellRangerIDE is located in the next GitHub URL: https://github.com/yoavnahum21/CellRangerIDE.git.

you can find the Implementation code in the CellRangerIDE\Backhand\Scripts

Instructions:

Fill out the general config file.
The general_config.yml file is located inside the const_files folder.
In the pipeline choose whatever you desire, your options are:

a) mkfastq - creating fastqs out of bcl

b) count - creating count matrixes and h5 files for single sample

c) multi - creating count matrixes and h5 files for multiple samples using hashtag oligos.

d) vdj - creating count matrixes for vdj

e) mkref - creating custom/filtered transcriptome reference - TBD

f) qc score - returns a score of the specific process

g) cellbender - remove ambient rna noises

according to what you have chosen in the pipeline you should fill out the right second config file named {pipeline)_config.yml
The config.yml file is divided into several types:

a) Optional: means this field is not necessary to the pipeline’s inputs.

b) Required: means that if you submit something not valid, the pipeline won’t work for you.

c) Default: means that if you don’t submit something there is a default input that the pipeline will use. (the default is also written in the comment above).

Requirements

For running the cellranger software, please use:

Option 1: if you are running on wexac just run before:

ml CellRanger/9.0.0
ml CellRanger-ATAC/2.0.0

Option 2: just download cellranger/cellranger-atac and add it to your path: For cellranger:

#!/bin/bash
cd /opt
wget -O cellranger-9.0.1.tar.gz "https://cf.10xgenomics.com/releases/cell-exp/cellranger-9.0.1.tar.gz?Expires=1739846992&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA&Signature=lWj4nhoNebGYpV7bZzkYzxxmnhdlFD0zuLKtbLQsqPevb5VRscVDDsEdlnXdjnavCB6vyQMJ8DzgH9CHZqFIWwIfJz8jL4iPDGXXIZq4zqW1LG46hR18xergOgDrLaysRxzUZiFI2BOimjDtARViyyxZRSeVEsN3oILMLpWRukOPRt3czKfSbffpRq4Nw-QSlsTQQovruwQ5x27AgZ7ENApYSOgGKF5GF~hbOJYVbTchDUHNvyHChwmLPgENTefM3ZeGS1-Vs0X2XUL~pbIDeVdVUvLrwj~McjPzZuvXq-XB26qbD3jWNQrhmEG31OUVUfGvsbC2xdkur9EJwTCssA__"
tar -xzvf cellranger-9.0.1.tar.gz
export PATH=/opt/cellranger-9.0.1:$PATH

     For cellranger-atac:

#!/bin/bash
cd /opt
wget -O cellranger-atac-2.1.0.tar.gz "https://cf.10xgenomics.com/releases/cell-atac/cellranger-atac-2.1.0.tar.gz?Expires=1739846838&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA&Signature=e91w5zq4U~sU4G3ibOj1fvO5HW19FrwFMs8WpsresMLUy~IoBbI2FfZbB3QsC1UvrXZjqZ2f4WEkLz36Ww7nfdI37-AkOnpaVZVt3gjwjnoUPfAbLdM3p1S37AgEtGJ00TOS4xzP3l1rxfV-9aGnIlGCVGojtQfT20L3j0mydvUVPmhvs2HXqzdbtgDcUeFU-d8YBt7GvcFrSaM6d4veWXgMKeX1K8fn7s9AlsvBfKeRAKTZu6UPK8w4DTbCpB9--nmTDNyKJjRH9I6AhIXp0NmDDJ81wuhVDQ5f6x0o1q0yX1UWnv8oWbvF5bAtUgLqF68E9v8L-2cANe1pgpjKCA__"
tar -xzvf cellranger-atac-2.1.0.tar.gz
export PATH=/opt/cellranger-atac-2.1.0:$PATH

For running cellbender first use the next environment:

cbenv.yaml

 For running any different pipeline please use the next environment:

crenv.yaml

# In case you are using cellbender
conda env create -f cbenv.yml
conda activate cellbender

# in case you are using something else
conda env create -f crenv.yml
conda activate cellranger

NOTE:

In the config files there are 3 types of files: string, list, int. Don't change it!
If you try to run mkfastq pipeline use before: module load bcl2fastq2 in the terminal, if you rather have another binary file of bcl2fastq2 instead, just make sure you add the parent folder to your path using: export $PATH:={your_bcl2fastq2_parent_folder_path} in the terminal.
In the config files, make sure that several lists need to have the same number of elements. (should be clarified in the comments above each attribute)
If you decide using an aligner, we encourage you to use the multi pipeline. This aligner generalize all kind of feature types including: Gene expression, Antibody Capture, VDJ etc.
Please be in touch with Yoav for any further questions :)

Further details about the inputs of the pipeline:

config_general.yml:

project_name: give your project a name

id: this is your output from cellranger in case you are using one of its pipelines

pipeline: choose your pipeline out of the list: cellbender, demulti, flex, mkfastq, mkref, multi, qc

aligner_software_path: give the path of aligner (in case you are using it), if you have exported your aligner to $PATH environment variable just use the aligner name instead. The aligner options:
- cellranger
- cellranger-atac
running_machine: are you using your Wexac or your PC? for Wexac just write Wexac or use Default, for you PC write PC or leave empty

aws_ec2: TBD

Runtime parameters: Nothing to elaborate
config_count_pipeline.yml:

id: A unique run ID string (e.g., sample345). The name is arbitrary and will be used to name the directory containing all pipeline-generated files and outputs.

alignment_ref_genome_file: Path to folder containing a Cell Ranger GEX/ATAC reference.

fastq_path: Path of the fastq path folder
config_cellbender_pipeline.yml:

data_path: add the path to your cellranger output directory

chosen_pipeline: specify what kind of pipeline you used to align your samples
config_demulti_pipeline.yml:

adata_path: add you h5 you wish to use

sample_names: give an arbitrary name according to the adata_path you are using

hto_list_per_sample: give the hashes names in each sample

demultiplex_method: there are 2 demultiplex algorithm we are using, hashsolo and demultiplex2, check out in both documentation what are the difference between them. hashsolo: https://www.sciencedirect.com/science/article/pii/S2405471220301952?via%3Dihub

demultiplex2: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03177-y

priors: only relevant when using hashsolo, those three elements describes the a-priors probabilities of each hypothesis: singlet, doublet, empty droplet
config_flex_pipeline.yml:

alignment_ref_genome_file: add reference genome path for gene expressions

when using flex you have 2 options: option 1: using INCPM link

INCPM_link: add the link you get from INCPM of the fastqs

INCPM_directory: where all the data will be downloaded

Note: if you are using option 1, pay attention you will be still asked for more inputs while running the main.py

option2: manual option

fastq_path: add the paths for fastqs parent folder

probe_set_path:

create_bam: True/False, do you wish to include bam in your output?

include_intros: True/False, do you wish to include introns while aligning?

expected_cell: estimate the number of expected cells in your experiment, you can leave it blanc

sample_name: arbitrary name for your sample, usually named after the prefix in the fastq

fastq_folders_name: necessarily has to be the prefix of the fastqs E.G: the fastq file is: PAC-i03-HmYC003-UNT-BMd000-xIPx-1-GEX-C_S2_L001_I1_001.fastq.gz The prefix is: PAC-i03-HmYC003-UNT-BMd000-xIPx-1-GEX-C

lanes_used: just leave as Default

multiplexing method: in case you are using hashes, choose your method: “feature_barcode” (hashtag oligos) or “cmo_barcode”

probe_barcode_csv: add a csv file that elaborates your multiplexing experiment according to cellranger csv format.

In case you didn’t uploaded a csv file

sample_id_probe: arbitrary name for your sample

probe_barcode_ids: arbitrary number for your barcode

probe_description: copied from sample_id_probe

config_mkfastq_pipeline.yml:

TBD
config_mkref_pipeline.yml:

TBD

config_multi_pipeline..yml:

alignment_ref_genome_file: add reference genome path for gene expressions

alignment_ref_vdj_file: add reference genome path for vdj

when using multi you have 2 options:

option 1: using INCPM link

INCPM_link: add the link you get from INCPM of the fastqs

INCPM_directory: where all the data will be downloaded

Note: if you are using option 1, pay attention you will be still asked for more inputs while running the main.py

option2: manual option

fastq_path: add the paths for fastqs parent folder

create_bam: True/False, do you wish to include bam in your output?

include_intros: True/False, do you wish to include introns while aligning?

expected_cell: estimate the number of expected cells in your experiment, you can leave it blanc

R1_length: what is the number of counted nucleotides from R1

R2_length: what is the number of counted nucleotides from R2

sample_name: arbitrary name for your sample, usually named after the prefix in the fastq

fastq_folders_name: necessarily has to be the prefix of the fastqs E.G: the fastq file is: PAC-i03-HmYC003-UNT-BMd000-xIPx-1-GEX-C_S2_L001_I1_001.fastq.gz The prefix is: PAC-i03-HmYC003-UNT-BMd000-xIPx-1-GEX-C

lanes_used: just leave as Default

multiplexing method: in case you are using hashes, choose your method: “feature_barcode” (hashtag oligos) or “cmo_barcode”

feature_types: respectively your submit in the sample name/fastq_folders_name please add their types: Gene Expression, Antibody Capture, CRISPR Guide Capture, Multiplexing Capture, VDJ-B, VDJ-T, VDJ-T-GD, Antigen Capture #5', Antigen Capture only

In case you are using feature_barcode:

feature_reference_csv: In case you have a prepared csv please add the path

otherwise do it manually:

hto_id: give the hashes a name

hto_names: usually are the same as hto_id

hto_read: declare which reading you are concatenate your hashes “R1” or “R2”

hto_pattern: explain where exactly the barcode are

hto_sequence: what is the sequence of each hash

HTO_feature_type: most of the time is anybody

In case you are using cmo_barcode:

cmo_barcode_csv: In case you have a prepared csv please add the path

sample_id_cmo:

cmo_id:

config_qc.yml:

Nothing needs to be added

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CellRangerIDE

Requirements

Further details about the inputs of the pipeline:

In case you didn’t uploaded a csv file

Files

README.md

Latest commit

History

README.md

File metadata and controls

CellRangerIDE

Requirements

Further details about the inputs of the pipeline:

In case you didn’t uploaded a csv file