epiverse-trace · Degoot-AM · Feb 27, 2026 · Feb 27, 2026 · Feb 27, 2026 · Feb 27, 2026
diff --git a/config.yaml b/config.yaml
@@ -59,10 +59,10 @@ contact: 'andree.valle-campos@lshtm.ac.uk'
 
 # Order of episodes in your lesson
 episodes:
-- read-cases.Rmd
+- read-case-data.Rmd
 - clean-data.Rmd
-- validate.Rmd
-- describe-cases.Rmd
+- tag-validate.Rmd
+- aggreagate-visualize.Rmd
 
 # Information for Learners
 learners:

diff --git a/episodes/describe-cases.Rmd → episodes/aggreagate-visualize.Rmd b/episodes/describe-cases.Rmd → episodes/aggreagate-visualize.Rmd
@@ -24,7 +24,7 @@ exercises: 10
 In an analytic pipeline, exploratory data analysis (EDA) is an important step before formal modelling. EDA helps determine relationships between variables and summarize their main characteristics, often by means of data visualization. 
 
 This episode focuses on EDA of outbreak data using R packages. 
-A key aspect of EDA in epidemic analysis is 'person, place and time'. It is useful to identify how observed events - such as confirmed cases, hospitalizations, deaths, and recoveries - change over time, and how these vary across different locations and demographic factors, including gender, age, and more.
+A key aspects of EDA in epidemic analysis are **person, place and time**. It is useful to identify how observed events--such as confirmed cases, hospitalizations, deaths, and recoveries--change over time, and how these vary across different locations and demographic factors, including gender, age, and more.
 
 Let's start by loading the `{incidence2}` package to aggregate the linelist data according to specific characteristics, and visualize the resulting epidemic curves (epicurves) that plot the number of new events (i.e. case incidence over time). 
 We'll use the `{simulist}` package to simulate the outbreak data to analyse,  and `{tracetheme}` for figure formatting. We'll use the pipe operator (`%>%`) to connect some of their functions, including others from the `{dplyr}` and `{ggplot2}` packages, so let's also call to the {tidyverse} package.
@@ -66,9 +66,9 @@ You can also find data sets from past real outbreaks within the [`{outbreaks}`](
 
 
 
-## Aggregating the data
+## Aggregating  linelist 
 
-Often we want to analyse and visualise the number of events that occur on a particular day or week, rather than focusing on individual cases. This requires grouping the linelist data into incidence data. The [{incidence2}]((https://www.reconverse.org/incidence2/articles/incidence2.html){.external target="_blank"}) package offers a useful function called `incidence2::incidence()` for grouping case data, usually based around dated events and/or other characteristics. The code chunk provided below demonstrates the creation of an `<incidence2>` class object from the simulated  Ebola `linelist` data based on the date of onset.
+Often we want to analyse and visualise the number of events that occur on a particular day or week, rather than focusing on individual cases. This requires converting the linelist data into incidence data. The [{incidence2}]((https://www.reconverse.org/incidence2/articles/incidence2.html){.external target="_blank"}) package offers a useful function called `incidence2::incidence()` for aggregating case data, usually based around dated events and/or other characteristics. The code chunk provided below demonstrates the creation of an `<incidence2>` class object from the simulated  Ebola `linelist` data based on the date of onset.
 
 ```{r}
 # Create an incidence object by aggregating case data based on the date of onset
@@ -82,7 +82,7 @@ daily_incidence <- incidence2::incidence(
 daily_incidence
 ```
 
-With the `{incidence2}` package, you can specify the desired interval (e.g. day, week) and categorize cases by one or more factors. Below is a code snippet demonstrating weekly cases grouped by the date of onset, sex, and type of case.
+With the `{incidence2}` package, you can specify the desired interval (e.g., day, week) and categorize cases by one or more factors. Below is a code snippet demonstrating weekly cases grouped by the date of onset, sex, and type of case.
 
 ```{r}
 # Group incidence data by week, accounting for sex and case type
@@ -150,7 +150,7 @@ base::plot(daily_incidence) +
     x = "Time (in days)", # x-axis label
     y = "Dialy cases" # y-axis label
   ) +
-  theme_bw()
+  tracetheme::theme_trace()
 ``` 
 
 
@@ -161,7 +161,7 @@ base::plot(weekly_incidence) +
     x = "Time (in weeks)", # x-axis label
     y = "weekly cases" # y-axis label
   ) +
-  theme_bw()
+  tracetheme::theme_trace()
 ``` 
 
 :::::::::::::::::::::::: callout
@@ -200,7 +200,7 @@ base::plot(cum_df) +
     x = "Time (in days)", # x-axis label
     y = "weekly cases" # y-axis label
   ) +
-  theme_bw()
+  tracetheme::theme_trace()
 ```
 
 Note that this function preserves grouping, i.e., if the `incidence2` object contains groups, it will accumulate the cases accordingly.

diff --git a/episodes/read-cases.Rmd → episodes/read-case-data.Rmd b/episodes/read-cases.Rmd → episodes/read-case-data.Rmd
diff --git a/episodes/validate.Rmd → episodes/tag-validate.Rmd b/episodes/validate.Rmd → episodes/tag-validate.Rmd
@@ -1,20 +1,20 @@
 ---
 title: 'Validate case data'
-teaching: 10
-exercises: 2
+teaching: 20
+exercises: 10
 ---
 
 
 :::::::::::::::::::::::::::::::::::::: questions 
 
-- How to convert a raw dataset into a `linelist` object?
+- How can a raw case data be converted into a `linelist` object?
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
 
 ::::::::::::::::::::::::::::::::::::: objectives
 
-- Demonstrate how to covert case data into `linelist` data
-- Demonstrate how to tag and validate data to make analysis more reliable
+- Demonstrate how to covert case data into `linelist` object
+- Demonstrate how to tag and validate data to improve the reliability of downstream analysis
 
 
 ::::::::::::::::::::::::::::::::::::::::::::::::
@@ -30,14 +30,13 @@
 
 ## Introduction
 
-In outbreak analysis, once you have completed the initial steps of reading and cleaning the case data, it's essential to establish an additional fundamental layer to ensure the integrity and reliability of subsequent analyses. Otherwise you might encounter issues during the analysis process due to creation or removal of specific variables, changes in their underlying data types (like `<date>` or `<chr>`), etc. Specifically, this additional step involves:
+In outbreak analysis, once you have completed the initial steps of reading and cleaning the case data, it's essential to establish an additional fundamental layer to ensure the integrity and reliability of subsequent analyses. Without this step, you may encounter issues later, for example, variables may be be unintentionally modified or removed, or their data types (e.g., `<date>`, `<chr>`), may change during processing. This additional layer typically involves two key steps:
 
-1. Verifying the presence and correct data type of certain columns within
-your dataset, a process commonly referred to as **tagging**;
-2. Implementing measures to make sure that these tagged columns are not inadvertently deleted during further data processing steps, known as **validation**.
+1. **tagging**: Verifying that required columns are present in the dataset and confirming that they have the correct data types.
+2. **validation**: Implementing safeguards to ensure that tagged columns are not accidentally deleted or altered during subsequent data manipulation steps.
 
 
-This episode focuses on tagging and validating outbreak data using the [linelist](https://epiverse-trace.github.io/linelist/) package. Let's start by loading the package `{rio}` to read data and the `{linelist}` package
+This episode focuses on creating linelist object using the [linelist](https://epiverse-trace.github.io/linelist/) package, which natively supports tagging and validating outbreak data o ensure data integrity throughout the analysis workflow. Let's start by loading the package `{rio}` to read data and the `{linelist}` package
 to create a linelist object. We'll use the pipe operator (`%>%`) to connect some of their functions, including others from the package `{dplyr}`. For this reason, we will also load the {tidyverse} package.
 
 
@@ -54,7 +53,7 @@
 
 ### The double-colon (`::`) operator
 
-The`::`in R lets you access functions or objects from a specific package without attaching the entire package to the search path. It offers several important
+The`::` in R lets you access functions or objects from a specific package without attaching the entire package to the search path. It offers several important
 advantages including the followings:
 
 * Telling explicitly which package a function comes from, reducing ambiguity and potential conflicts when several packages have functions with the same name.
@@ -66,7 +65,7 @@

 :::::::::::::::::::

 Import the dataset following the guidelines outlined in the [Read case data](../episodes/read-cases.Rmd) episode. This involves loading the dataset into the working environment and view its structure and content.

 ```{r, eval=FALSE}
 # Read data
@@ -110,7 +109,7 @@
 
 ## Creating a linelist and tagging columns
 
-Once the data is loaded and cleaned, we can convert the cleaned case data into a `linelist` object using `{linelist}` package, as in the below code chunk.
+Once the data is loaded and cleaned, it can be converted  into a `linelist` object using `{linelist}` package, as illustrated in the code chunk below.
 
 ```{r}
 # Create a linelist object from cleaned data
@@ -125,17 +124,15 @@
 linelist_data
 ```
 
-The `{linelist}` package supplies tags for common epidemiological variables
-and a set of appropriate data types for each. You can view the list of available tags by the variable name and their acceptable data types using the `linelist::tags_types()` function.
+The `{linelist}` package provides predefined tags for common epidemiological variables, along with the appropriate data types for each. You can view all available tags and their corresponding acceptable data types using the `linelist::tags_types()` function.
 
 ::::::::::::::::::::::::::::::::::::: challenge
 
-Let's **tag** more variables. In some datasets, it is possible to encounter variable names that are different from the available tag names. In such cases, we can associate them based on how variables were defined for data collection.
+Let's now **tag** additional variables. In some datasets, variable names may not exactly match the predefined tag names. In these cases, you can map them based on how the variables were defined during data collection. You need to:
 
-Now:
--**Explore** the available tag names in `{linelist}`.
--**Find** what other variables in the input dataset can be associated with any of these available tags.
--**Tag** those variables as shown above using the `linelist::make_linelist()`
+- **Explore** the available tag names in `{linelist}`.
+- **Find** what other variables in the input dataset can be associated with any of these available tags.
+- **Tag** those variables as shown above using the `linelist::make_linelist()`
 function.
 
 :::::::::::::::::::: hint
@@ -165,9 +162,9 @@
 ```
 
 
-Are these additional tags visible in the output?
+Are the additional tags visible in the output?
 
-< !--Do you want to see a display of available and tagged variables? You can explore the function `linelist::tags()` and read its [reference documentation](https://epiverse-trace.github.io/linelist/reference/tags.html).- ->
+Do you want to see a display of available and tagged variables? You can explore the function `linelist::tags()` and read its [reference documentation](https://epiverse-trace.github.io/linelist/reference/tags.html).
 
 :::::::::::::::::::::
 
@@ -176,7 +173,7 @@
 
 ## Validation
 
-To ensure that all tagged variables are standardized and have the correct data
+To validate that all tagged variables are standardized and have the correct data
 types, use the `linelist::validate_linelist()` function, as shown in the example below:
 
 ```{r}
@@ -190,6 +187,7 @@
 
 ::::::::::::::::::::::::: challenge
 
+## Changes in Variable Types During Linelist Validation
 Let's assume the following scenario during an ongoing outbreak. You notice at some point that the data stream you have been relying on has a set of new entries (i.e., rows or observations), and the data type of one variable has changed.
 
 Let's consider the example where the type `age` variable has changed from a double (`<dbl>`) to character (`<chr>`).
@@ -310,18 +308,20 @@
 
 ## Safeguarding
 
-Safeguarding is implicitly built into the linelist objects. If you try to drop any of the tagged columns, you will receive an error or warning message, as shown in the example below.
+Safeguarding is implicitly built into the linelist objects. If you try to delete or modify any of the tagged columns, you will receive an error or warning message, as shown in the example below.
 
 ```{r, warning=TRUE}
 new_df <- linelist_data %>%
   dplyr::select(case_id, gender)
 ```
 
-This `Warning` message above is the default output option when we lose tags in a `linelist` object. However, it can be changed to an `Error` message using the `linelist::lost_tags_action()` function.
+This `Warning`  is the default option when we lose tags in a `linelist` object. However, it can be changed to an `Error` message using the `linelist::lost_tags_action()` function.
 
 
 ::::::::::::::::::::::::::::::::::::: challenge
 
+## Exploring Safeguarding Behavior for Lost Tags
+
 Let's test the implications of changing the **safeguarding** configuration from a `Warning` to an `Error` message.
 
 - First, run this code to count the frequency of each category within a categorical variable:
@@ -388,6 +388,8 @@
 - Use the `{linelist}` package to tag,
 validate,
 and prepare case data for downstream analysis.
+- Explore and map dataset variables to predefined tags for standardization.
+- Understand how warnings vs. errors affect the data processing workflow.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::