Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error related to .check_tsne_params #24

Open
WT215 opened this issue Jan 16, 2019 · 8 comments
Open

Error related to .check_tsne_params #24

WT215 opened this issue Jan 16, 2019 · 8 comments

Comments

@WT215
Copy link

WT215 commented Jan 16, 2019

Hello!

I am trying BISCUIT on a data, however got the following error:

[1] "Loading Data"
[1] "Calculating the Fiedler vector of the data"
[1] "Ensuring entire data is numeric and then log transforming it"
[1] "numcells is 74"
[1] "numgenes is 9377"
[1] "Number of gene batches is 62"
[1] "Number of gene subbatches is 2"
[1] "Ensuring user-specified data is numeric"
[1] "Computing t-sne projection of the data"
Error in .check_tsne_params(nrow(X), dims = dims, perplexity = perplexity,  : 
  perplexity is too large for the number of samples
In addition: There were 19 warnings (use warnings() to see them)

Data comes from Grun et al 2014.

The parameter setting is as following:

## 21st Dec 2016
## BISCUIT R implementation
## Start_file with user inputs
## 
## Code author SP


###
###

############## packages required ##############

library(MCMCpack)
library(mvtnorm)
library(ellipse)
library(coda)
library(Matrix)
library(Rtsne)
library(gtools)
library(foreach)
library(doParallel)
library(doSNOW)
library(snow)
library(lattice)
library(MASS)
library(bayesm)
library(robustbase)
library(chron)
library(mnormt)
library(schoolmath)
library(RColorBrewer)

#############################################


#input_file_name <- "expression_mRNA_17-Aug-2014.txt";

input_data_tab_delimited <- TRUE; #set to TRUE if the input data is tab-delimited

is_format_genes_cells <-  TRUE; #set to TRUE if input data has rows as genes and columns as cells

#choose_cells <- 3000; #comment if you want all the cells to be considered

#choose_genes <- 150; #comment if you want all the genes to be considered

gene_batch <- 150; #number of genes per batch, therefore num_batches = choose_genes (or numgenes)/gene_batch. Max value is 150

num_iter <- 5; #number of iterations, choose based on data size.

num_cores <- detectCores() - 4; #number of cores for parallel processing. Ensure that detectCores() > 1 for parallel processing to work, else set num_cores to 1.

z_true_labels_avl <- FALSE; #set this to TRUE if the true labels of cells are available, else set it to FALSE. If TRUE, ensure to populate 'z_true' with the true labels in 'BISCUIT_process_data.R'

num_cells_batch <- 1000; #set this to 1000 if input number of cells is in the 1000s, else set it to 100.

alpha <- 0.1; #DPMM dispersion parameter. A higher value spins more clusters whereas a lower value spins lesser clusters.

#output_folder_name <- "output"; #give a name for your output folder.

## call BISCUIT
source("BISCUIT_main.R")

The data I used can be found here: https://github.com/WT215/Raw_data (Grun_2i.txt)

Thank you for your help!

Best wishes,
Wenhao

@sandhya212
Copy link
Owner

Hi Wenhao,

Since you only have 74 cells, and tSNE has a default perplexity set to 30 (that is normally meant to handle larger number of cells), you would need to reduce this to 10 or 1. It is an error thrown by the Rtsne().
Set num_cells_batch <- 100; (in the start file)
I could not access the Github link where the data you used is.

Let me know if these help.

@WT215
Copy link
Author

WT215 commented Jan 16, 2019

Hi Wenhao,

Since you only have 74 cells, and tSNE has a default perplexity set to 30 (that is normally meant to handle larger number of cells), you would need to reduce this to 10 or 1. It is an error thrown by the Rtsne().
Set num_cells_batch <- 100; (in the start file)
I could not access the Github link where the data you used is.

Let me know if these help.

Hi,

Thank you for your reply!

I set num_cells_batch <- 100, and then rerun the code but still get the same error.

Do I also need to modified other code in other R files?

I have updated the link to the data.

Thank you very much!

Best wishes,
Wenhao

@sandhya212
Copy link
Owner

In https://github.com/sandhya212/BISCUIT_SingleCell_IMM_ICML_2016/blob/master/BISCUIT_process_data.R, line 214 and 225, add the perplexity parameter= 10 (or 1) to Rtsne(). Refer Rtsne options here: https://www.rdocumentation.org/packages/Rtsne/versions/0.15/topics/Rtsne

My concern is more on a statistical level where you are clustering a highly-sparse matrix and where the #cells <<< #genes. Any clustering method will give you an answer, the question is how much can you trust the learnt pattern given such a skewed dataset.

@WT215
Copy link
Author

WT215 commented Jan 16, 2019

Hi,

So I tried a larger dataset: Tung et al 2017, which was also stored in https://github.com/WT215/Raw_data (Tung.txt).

There are around 500 cells, so I set num_cells_batch <- 1000.

I got the following error:

[1] "Loading Data"
[1] "Calculating the Fiedler vector of the data"
[1] "Ensuring entire data is numeric and then log transforming it"
[1] "numcells is 564"
[1] "numgenes is 13058"
[1] "Number of gene batches is 261"
[1] "Number of gene subbatches is 9"
[1] "Ensuring user-specified data is numeric"
[1] "Computing t-sne projection of the data"
[1] "Monitor log.txt and outputs/plots/ folder for outputs"
[1] "floor(num_gene_batches/num_gene_sub_batches): 29"
[1] "MCMC begins"
[1] "Begin parallel processing of gene splits"
[1] "Beginning of batch  1"
[1] "End of batch  1"
[1] "Beginning of batch  2"
[1] "End of batch  2"
[1] "Beginning of batch  3"
[1] "End of batch  3"
[1] "Beginning of batch  4"
[1] "End of batch  4"
[1] "Beginning of batch  5"
[1] "End of batch  5"
[1] "Beginning of batch  6"
[1] "End of batch  6"
[1] "Beginning of batch  7"
[1] "End of batch  7"
[1] "Beginning of batch  8"
[1] "End of batch  8"
[1] "Beginning of batch  9"
[1] "End of batch  9"
[1] "Beginning of batch  10"
[1] "End of batch  10"
[1] "Beginning of batch  11"
[1] "End of batch  11"
[1] "Beginning of batch  12"
[1] "End of batch  12"
[1] "Beginning of batch  13"
[1] "End of batch  13"
[1] "Beginning of batch  14"
[1] "End of batch  14"
[1] "Beginning of batch  15"
[1] "End of batch  15"
[1] "Beginning of batch  16"
[1] "End of batch  16"
[1] "Beginning of batch  17"
[1] "End of batch  17"
[1] "Beginning of batch  18"
[1] "End of batch  18"
[1] "Beginning of batch  19"
[1] "End of batch  19"
[1] "Beginning of batch  20"
[1] "End of batch  20"
[1] "Beginning of batch  21"
[1] "End of batch  21"
[1] "Beginning of batch  22"
[1] "End of batch  22"
[1] "Beginning of batch  23"
[1] "End of batch  23"
[1] "Beginning of batch  24"
[1] "End of batch  24"
[1] "Beginning of batch  25"
[1] "End of batch  25"
[1] "Beginning of batch  26"
[1] "End of batch  26"
[1] "Beginning of batch  27"
[1] "End of batch  27"
[1] "Beginning of batch  28"
[1] "End of batch  28"
[1] "Beginning of batch  29"
[1] "End of batch  29"
[1] "End of parallel runs"
Time difference of 8.34391 mins
[1] "Merging gene splits"
[1] "Computing the global confusion matrix"
[1] "Monitor log_CM.txt in outputs folder and debug_CM.txt"
 Show Traceback
 
 Rerun with Debug
 Error in { : task 2 failed - "下标出界" 

Thank you for your help!

@sandhya212
Copy link
Owner

sandhya212 commented Jan 16, 2019

  1. Set num_cells_batch <- 100 since you still have < 1000 cells
  2. what is the word after 'task 2 failed -'?
  3. Can you delete the debug files and run again as a fresh instance, preferably with a smaller number of genes (like 2000) just to see that the code runs to completion.

@WT215
Copy link
Author

WT215 commented Jan 17, 2019

  1. Set num_cells_batch <- 100 since you still have < 1000 cells
  2. what is the word after 'task 2 failed -'?
  3. Can you delete the debug files and run again as a fresh instance, preferably with a smaller number of genes (like 2000) just to see that the code runs to completion.

Hi, for Tung dataset, I set num_cells_batch <- 100

but got the error:
Error in { : task 3 failed - "无法分配大小为1.3 Gb的矢量" which means cannot allocate 1.3Gb...

When l tried it on a smaller subset of the data, like 2000 genes, I got the error:
Error in { : task 1 failed - "无法分配大小为30.5 Mb的矢量".

Then I reduced the dataset to include 1000 genes and it works ok. How to apply BISCUIT on the whole Tung data set (13058genes*564cells)?

Do I have to run it using clusters?

Thanks a lot!

@sandhya212
Copy link
Owner

Yes, we have run Biscuit on AWS clusters.

@WT215
Copy link
Author

WT215 commented Jan 17, 2019

Yes, we have run Biscuit on AWS clusters.

Then how could I estimate how much memory should be allocated for a dataset like Tung et al. in advance?

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants