-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathREADME.Rmd
More file actions
107 lines (74 loc) · 4.22 KB
/
Copy pathREADME.Rmd
File metadata and controls
107 lines (74 loc) · 4.22 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# ulrb
<!-- badges: start -->
[](https://github.com/pascoalf/ulrb/actions/workflows/R-CMD-check.yaml)
[](https://cran.r-project.org/package=ulrb)
[](https://github.com/pascoalf/ulrb/actions/workflows/test-coverage.yaml)
[](https://perso.crans.org/besson/LICENSE.html)
[](https://lifecycle.r-lib.org/articles/stages.html#stable)
[](https://doi.org/10.5281/zenodo.14922441)
[](https://cran.r-project.org/package=ulrb)
<!-- badges: end -->
The R package **ulrb** stands for **Unsupervised Learning Based Definition of Microbial Rare Biosphere**. As the name suggests, it applies unsupervised learning principles to define the rare biosphere.
More specifically, the partitioning around medoids (k-medoids) algorithm is
used to divide taxa within a sample into clusters. The clusters are then
ordered based on a user-defined classification vector. By default, our
method classifies all taxa in: "rare", "undetermined" or "abundant". In alternative,
the user can change the number of classifications. To do so, ulrb includes functions to help
the user decide the number of clusters (k), but it is also possible for ulrb to
automatically decide the number of clusters (equivalent to the number of classifications). Besides clustering, ulrb includes functions to
help evaluate the clustering quality (e.g. average Silhouette score).
For more details on the R functions used and data wrangling please see the package
documentation.
For tutorials and documentation of the **ulrb** package, visit our website: https://pascoalf.github.io/ulrb/.
## Installation
To install the last stable version (0.1.6), use:
```{r, eval=FALSE}
install.packages("ulrb")
```
If you want to install the last version available on GitHub (0.1.8), use:
```{r, eval=FALSE}
# install.packages("devtools")
devtools::install_github("pascoalf/ulrb")
```
## Example
This is a basic example which shows you how to use ulrb to divide taxa into three classifications (rare, undetermined and abundant):
```{r example}
library(ulrb)
## basic example
define_rb(nice_tidy)
```
With ulrb, you can also format your original abundance table, get an automatic number of clusters and plot the results:
```{r, fig.width= 10}
# nice is an OTU table in wide format
head(nice)
# first, we tidy the "nice" OTU table
sample_names <- c("ERR2044662", "ERR2044663", "ERR2044664",
"ERR2044665", "ERR2044666", "ERR2044667",
"ERR2044668", "ERR2044669", "ERR2044670")
# If data is in wide format, with samples in cols
nice_tidy <- prepare_tidy_data(nice,
sample_names = sample_names,
samples_in = "cols")
# second, we apply ulrb algorithm in automatic setting
nice_classification_results <- define_rb(nice_tidy)
# third, we plot microbial community and the quality of k-medoids clustering
plot_ulrb(nice_classification_results, taxa_col = "OTU", plot_all = TRUE)
# In case you want to inspect the result of a particular sample, do:
plot_ulrb(nice_classification_results, taxa_col = "OTU",
sample_id = "ERR2044662", plot_all = FALSE, log_scaled = TRUE)
```
## How to cite ulrb
Pascoal, F., Branco, P., Torgo, L. et al. Definition of the microbial rare biosphere through unsupervised machine learning. Commun Biol 8, 544 (2025). https://doi.org/10.1038/s42003-025-07912-4
Pascoal, F., Costa, R., Torgo, L., Magalhães, C., & Branco, P. (2025). Architecture and implementation of ulrb algorithm in R. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2025.103229