-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdatabase_structure.qmd
376 lines (256 loc) · 13.3 KB
/
database_structure.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
# Data structure
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
results = "asis",
echo = FALSE,
message = FALSE,
warning = FALSE
)
library(traits.build)
my_kable_styling <- function(...) {
kableExtra::kable(..., booktabs=TRUE, format="latex", longtable = TRUE) %>%
kableExtra::kable_styling(font_size = 10, latex_options = c("hold_position", "repeat_header")) %>%
kableExtra::column_spec(1, width = "5cm") %>%
kableExtra::column_spec(2, width = "11cm") #%>%
# kableExtra::kable_styling(font_size = 10, full_width = TRUE, latex_options = "striped", position = "float_left")
}
if(knitr::is_html_output()) {
my_kable_styling <- util_kable_styling_html
}
```
```{r, echo=FALSE, results='hide', message=FALSE}
## Loads austraits into global name space
austraits <- austraits:::austraits_5.0.0_lite
schema <- get_schema()
```
This chapter describes the structure of the output of a `traits.build` compilation.
Note that the information below is based on the information provided within the file `traits.build_schema.yml`, which can be accessed by running `get_schema` or `system.file("support", "traits.build_schema.yml", package = "traits.build")`.
A `traits.build` compilation results in a series of linked components, which cross link against each other:
```{r, results="hide", comment = '', eval=FALSE}
names(austraits) %>%
create_tree_branch("traits.build") %>%
writeLines()
```
```
austraits
├── traits
├── locations
├── contexts
├── methods
├── excluded_data
├── taxonomic_updates
├── taxa
├── contributors
├── sources
├── definitions
├── schema
├── metadata
└── build_info
```
These include all the data and contextual information submitted with each contributed dataset.
## Components
The core components are defined as follows.
```{r}
print_schema_element <- function(elements) {
if (elements$type == "character") {
sprintf("**Content:** %s\n", elements$value) %>%
writeLines()
}
if (elements$type == "table") {
sprintf("**Content:** \n") %>%
writeLines()
elements$elements %>%
austraits::convert_list_to_df1() %>%
my_kable_styling() %>%
writeLines()
}
}
```
## Traits {#traits}
```{r}
elements <- schema$austraits$elements$traits
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
elements %>%
print_schema_element()
writeLines(c(""))
```
### Entity type {#entity_type}
An entity is the `feature of interest`, indicating what a trait value applies to. While an entity can be just a component of an organism, within the scope of AusTraits, an `individual` is the finest scale entity that can be documented. The same study might measure some traits at a population-level (`entity = population`) and others at an individual-level (`entity = individual`).
In detail:
- `entity_type` is `r tolower(schema$metadata$elements$traits$elements$entity_type)` Possible values are:
```{r entity_type}
schema$entity_type$values %>%
austraits::convert_list_to_df1() %>%
my_kable_styling() %>%
writeLines()
```
### Identifiers
The traits table includes 12 identifiers, `dataset_id`, `observation_id`, `taxon_name`, `population_id`, `individual_id`, `temporal_context_id`, `source_id`, `location_id`, `entity_context_id`, `plot_context_id`, `treatment_context_id`, and `method_context_id`.
`dataset_id`, `source_id` and `taxon_name` have easy-to-interpret values. The others are simply integral identifiers that link groups of measurements and are automatically generated through the AusTraits workflow (`individual_id` can be assigned in the metadata file or automatically generated.)
To expand on the definitions provided above,
- `observation_id` links measurements made on the same entity (individual, population, or species) at a single point in time.
- `population_id` indicates entities that share a common `location_id`, `plot_context_id`, and `treatment_context_id`. It is used to align measurements and `observation_id`'s for `individuals` versus `populations` (i.e. distinct `entity_types`) that share a common `population_id`. It is numbered sequentially within a dataset.
- `individual_id` indicates a unique organism. It is numbered sequentially within a dataset by population. Multiple observations on the same organism across time (with distinct `observation_id` values), share a common `individual_id`.
- `temporal_context_id` indicates a distinct point in time and is used only if there are repeat measurements on a population or individual across time. The identifier links to context properties (and their associated information) in the `contexts` table for context properties of type `temporal`.
- `source_id` is applied if not all data within a single dataset (`dataset_id`) is from the same source, such as when a dataset represents a compilation for a meta-analysis.
- `location_id` links to a distinct `location_name` and associated `location_properties` in the `location` table.
- `entity_context_id` links to information in the `contexts` table for context properties (& associated values/descriptions) with category `entity_context`. `Entity_contexts` include organism sex, organism caste and any other features of an entity that need to be documented.
- `plot_context_id` links to information in the `contexts` table for context properties (& associated values/descriptions) with category `plot`. `Plot contexts` include both blocks/plots within an experimental design as well as any stratified variation within a location that needs to be documented (e.g. slope position).
- `treatment_context_id`links to information in the `contexts` table for context properties (& associated values/descriptions) with category `treatment`. `Treatment contexts` are experimental manipulations applied to groups of individuals.
- `method_context_id`links to information in the `contexts` table for context properties (& associated values/descriptions) with category `method`. A `method context` indicates that the same trait was measured on or across individuals using different methods.
Additionally, `measurement_remarks` is used to document `r tolower(schema$metadata$elements$traits$elements$measurement_remarks)`
### Life stage, basis of record {#life_stage}
- `life_stage`: `r stringr::str_replace(schema$metadata$elements$traits$elements$life_stage,"^A", "a")`
- `basis_of_record`: `r tolower(schema$metadata$elements$traits$elements$basis_of_record)`
Possible values are:
```{r basis_of_record}
schema$basis_of_record$values %>%
austraits::convert_list_to_df1() %>%
my_kable_styling() %>%
writeLines()
```
### Values, value types, basis of value {#value_types}
Each record in the table of trait data has an associated `value`, `value_type`, and `basis_of_value`.
`Values:` A trait's values are either `numeric` or `categorical`. For traits with numerical values, the recorded value has been converted into standardised units and the AusTraits workflow has confirmed the value can be converted into a number and lies within the allowable range. For categorical variables, records have been aligned through substitutions to values listed as allowable values (terms) in a trait's definition.
- we use `_` for multi-word terms, e.g. `semi_deciduous`
- we use a space for situations where two values co-occur for the same entity. For instance, a flora might indicate that a plant species can be either annual or biennial, in which case the trait is scored as `annual biennial`.
`Value types:` Each trait measurement has an associated `value_type`, which is `r tolower(schema$metadata$elements$traits$elements$value_type)`
Possible value types are:
```{r value_type}
schema$value_type$values %>%
austraits::convert_list_to_df1() %>%
my_kable_styling() %>%
writeLines()
```
Each trait measurement also has an associated `basis_of_value`, which is `r tolower(schema$metadata$elements$traits$elements$basis_of_value)`
Possible values are:
```{r basis_of_value}
schema$basis_of_value$values %>%
austraits::convert_list_to_df1() %>%
my_kable_styling() %>%
writeLines()
```
## Locations
```{r}
elements <- schema$austraits$elements$locations
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
elements %>%
print_schema_element()
writeLines(c(""))
```
## Contexts {#contexts}
```{r}
elements <- schema$austraits$elements$contexts
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
elements %>%
print_schema_element()
writeLines(c(""))
```
## Methods {#methods}
```{r}
elements <- schema$austraits$elements$methods
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
elements %>%
print_schema_element()
writeLines(c(""))
```
## Excluded_data
```{r}
elements <- schema$austraits$elements$excluded_data
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
elements %>%
print_schema_element()
writeLines(c(""))
```
## Taxa
**Description:** A table containing details on taxa that are included in the table [`traits`](#traits). We have attempted to align species names with known taxonomic units in the [`Australian Plant Census` (APC)](https://biodiversity.org.au/nsl/services/apc) and/or the [`Australian Plant Names Index` (APNI)](https://biodiversity.org.au/nsl/services/APNI); the sourced information is released under a CC-BY3 license.
Version `r desc::desc_get_field("Version")` of AusTraits contains records for `r austraits$traits$taxon_name %>% unique() %>% length()` different taxa.
```{r}
elements <- schema$austraits$elements$taxa
elements %>%
print_schema_element()
writeLines(c(""))
```
## Taxonomic_updates
```{r}
elements <- schema$austraits$elements$taxonomic_updates
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
elements %>%
print_schema_element()
writeLines(c(""))
```
Both the original and the updated taxon names are included in the [`traits`](#traits) table.
## Definitions
```{r}
elements <- schema$austraits$elements$definitions
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
elements %>%
print_schema_element()
writeLines(c(""))
```
**Details on trait definitions:** The allowable trait names and trait values are defined in the definitions file. Each trait is labelled as either `numeric` or `categorical`. An example of each type is as follows. For an example, see the the [Trait definitions for AusTraits](http://traitecoevo.github.io/austraits.build/articles/trait_definitions.html).
```{r, traits}
for (trait in c("leaf_mass_per_area", "woodiness")) {
elements <- austraits$schema$traits$elements[[trait]]
data_trait <- austraits$traits %>% filter(trait_name == trait)
c(sprintf("**%s**\n\n", trait), sprintf("- label: %s", elements$label), sprintf("- description: %s", elements$description), sprintf("- number of records: %s", data_trait %>% nrow()), sprintf("- number of studies: %s", data_trait %>% pull(dataset_id) %>% unique() %>% length()), sprintf("- type: %s%s", elements$type, ifelse(elements$type == "numeric", sprintf("\n- units: %s", elements$units), "")), ifelse(elements$type == "numeric", sprintf("- allowable range: %s - %s %s", elements$values$minimum, elements$values$maximum, elements$units), sprintf("- allowable values:\n%s\n", paste0(" - *", elements$values %>% names(), "*: ", elements$values %>% unlist(), collapse = "\n"))), "") %>% writeLines()
}
```
## Contributors
```{r}
elements <- schema$austraits$elements$contributors
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
elements %>%
print_schema_element()
writeLines(c(""))
```
## Sources
For each dataset in the compilation there is the option to list primary and secondary citations. The primary citation is defined as, `r austraits$schema$metadata$elements$source$values$primary$description` The secondary citation is defined as, `r austraits$schema$metadata$elements$source$values$secondary$description`
The element `sources` includes bibtex versions of all sources which can be imported into your reference library:
```{r, eval=FALSE}
# write all sources to file
RefManageR::WriteBib(austraits$sources, "refs.bib")
# write a single reference to a file
RefManageR::WriteBib(austraits$sources["Falster_2005_1"], "refs.bib")
```
Or individually viewed:
```{r, eval=FALSE}
austraits$sources["Falster_2005_1"]
```
A formatted version of the sources also exists within the table [methods](#methods).
## Metadata
```{r}
elements <- schema$austraits$elements$metadata
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
writeLines(c(""))
```
## Build_info
```{r}
elements <- schema$austraits$elements$build_info
sprintf("**Description:** %s.\n", elements$description) %>%
stringr::str_replace_all("\\.\\.", "\\.") %>%
writeLines()
elements %>%
print_schema_element()
writeLines(c(""))
```