Skip to content

Latest commit

 

History

History
106 lines (58 loc) · 4.13 KB

README.md

File metadata and controls

106 lines (58 loc) · 4.13 KB

RUBic : Rapid Unsupervised Biclustering


Fig1

The unsupervised biclustering strategy works both in interaction data and expression data. Initially, it converts the expression data into binary data using mixture of left truncated Gaussian distribution model (LTMG) and find the biclusters using novel encoding and template searching strategy and finally generates the biclusters in two modes base and flex. In base mode RUBic generates maximal biclusters (green borders) and in flex mode results less and biological significant clusters (red bordered). Coloured cell box within the clusters indicates the selected row and column positions.

Citation

If you have used RUBic in your research, please kindly cite the following publications:

Sriwastava, B.K., Halder, A.K., Basu, S., Chakraborti, T. RUBic: rapid unsupervised biclustering. BMC Bioinformatics 24, 435 (2023). DOI https://doi.org/10.1186/s12859-023-05534-3

Repository Contents and Data Directory

The data directory contains the Dummy data and 5 expression datasets and a PPI data matrix.

The Dummy data includes two files,

a) SBMat.txt: sample input binary data,

b) resultRB.txt: corresponding output file by generated by RUBIC on dummy input data.

Five different experimental datasets along with a PPI are also included in the data directory

a) Expression+KEGG : contains expression matrix, binary matrix and kegg annotation  for each of 4 sets (ecoli_colombos, ecoli_dream5, yeast_dream5 and yeast_gpl2529)

b) Match_score_csv

c) raw datset_CNS 

d) Performance_test_csv

e) Match_score_density_200x200_csv  and

f) PPI

The RUBIC directory contains biclustering scripts(RUBIC.c), installation scripts (P1-installandCompile.sh,P2-runwargs.sh) and jupyter notebook file(RUBIC-Result-Analysis.ipynb) with auxilary python scripts (load_matrix_data.py, ParseCluster.py, plotHeatmap.py).

Compiler Installation, RUBic Compilation and Execution Commands

In any linux enviournment open a terminal and execute the following commands:

Navigate to the directory RUBIC

chmod +x P1-installandCompile.sh

P1-installandCompile.sh RUBIC.c

chmod +x P2-runwargs.sh

P2-runwargs.sh RUBIC inputdata.txt output.txt 2 2 1

Mannual Setup

Environment

  1. GCC compiler and/or C++ 11 compatible compiler

  2. Result Processing and visualisation :

    a) python >= 3

    b) seaborn

Input

The input to RUBic is in two formats:

  1. Binary matrix [Interaction data: eg. Protein-protein interactions, Drug-Drug interaction)]
  2. non Binary matrix [Gene expression data of m rows (Genes) and n column (conditions)]

The data file should be comma delimited. A sample data format is given in RUBIC directory.

Usage

Step 1 : Compile the RUBIC.c file with GCC compiler. : RUBIC.o

Step 2 : Convert the expression data into binary matrix if not binary matrix.

Step 3 : Keep the input file in the same directory <Example: inputdata.txt>

Step 4 : Execute RUBIC wit the command:

    ./RUBIC.o <inputfile> <outputfile> <mnc> <mnr> <threshold>

inputfile: input file name

outputfile: output file name

mnc: minimum no. of column

mnr: minimum no. of row

threshold: for binary 1.

Visualisation and Result Processing

To visualise the result we have created a python notebook. You can find the details at (https://github.com/CMATERJU-BIOINFO/RUBic/blob/main/RUBIC/RUBIC-Result-Analysis.ipynb). The Jupyter file first demostrates the input binary matrix with visual representation and marks the row-col positions of identified bi-clusters from RUBic on dummy data. In the later section of the jupyter file, describe the figure preparation. Expression level Heatmaps and plots.