Skip to content

Commit

Permalink
[custom] fix lesson contents
Browse files Browse the repository at this point in the history
  • Loading branch information
zkamvar authored and Carpentries Apprentice committed Feb 7, 2023
1 parent a4bdf28 commit e5e2191
Show file tree
Hide file tree
Showing 19 changed files with 2,172 additions and 230 deletions.
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
^renv$
^renv\.lock$
^\.travis\.yml$
^tic\.R$
5 changes: 5 additions & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,11 @@ episodes:

# Information for Learners
learners:
- reference.md
- intro-R-handout.Rmd
- starting-with-data-handout.Rmd
- data-wrangling-handout.Rmd
- data-visualisation-handout.Rmd

# Information for Instructors
instructors:
Expand Down
Empty file added episodes/.here
Empty file.
87 changes: 47 additions & 40 deletions episodes/.ignore-05-databases.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,19 @@ keypoints:
- "First key point."
---


```{r, include=FALSE}
source("../bin/chunk-options.R")
knitr_fig_path("06-")
source("../bin/download_data.R")
```


## Introduction

A common problem with R in that all operations are conducted in-memory. Here, "memory" means a computer
drive called Random Access Memory (RAM for short) which is phsically installed on your computer. RAM
A common problem with R in that all operations are conducted in-memory. Here, "memory" means a computer
drive called Random Access Memory (RAM for short) which is phsically installed on your computer. RAM
allows for data to be read in nearly the same amount of time regardless of its physical location
inside the memory. This is in stark contrast to the speed data could be read from CDs, DVDs, or other
storage media, where the speed of transfer depended on how quickly the drive could rotate or the
storage media, where the speed of transfer depended on how quickly the drive could rotate or the
arm could move. Unfortunately, your computer has a limited amount of RAM, so the amount of data you
can work with is limited by the available memory. So far, we have used small datasets that can
easily fit into your computer's memory. But what about datasets that are too large for your
Expand All @@ -47,12 +45,10 @@ Once we have made the connection to the database, much of what we do will look f

In this lesson, we will be connecting to an SQLite database, which allows us to send strings containing SQL statements directly from R to the database and recieve the results. In addition, we will be connecting to the database in such a way that we can use 'dplyr' functions to operate directly on the database tables.


## Prelminaries

First, install and load the neccessary packages. You can install the `RSQLite` package with


```{r, eval=FALSE}
install.packages("RSQLite")
```
Expand Down Expand Up @@ -141,18 +137,22 @@ data
dbClearResult(results)
```

> ## Exercise
>
> What happens if you send invalid SQL syntax?
>
> > ## Solution
> >
> > An error message is returned from SQLite.
> > Notice that R is just the conduit; it cannot check the SQL syntax.
> >
> >
> {: .solution}
{: .challenge}
::::::::::::::::::::::::::::::::::::::::: callout

## Exercise

What happens if you send invalid SQL syntax?

::::::::::::::: solution

## Solution

An error message is returned from SQLite.
Notice that R is just the conduit; it cannot check the SQL syntax.

:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::

We can also create a new database and add tables to it. Let's base this new dataframe on the Question1 table that can be found in our existing database.

Expand Down Expand Up @@ -213,7 +213,7 @@ SN7577_d %>%
summarize(avg_age = mean(numage))
```

Notice that on the `nrow` command we get NA rather than a count of rows. Thisis because `dplyr` doesn't hold the full table even after the 'Select * ...'
Notice that on the `nrow` command we get NA rather than a count of rows. Thisis because `dplyr` doesn't hold the full table even after the 'Select \* ...'

If you need the row count you can use

Expand All @@ -222,25 +222,32 @@ SN7577_d %>%
tally()
```

> ## Exercise
>
> Store the SN7577 table as an object for `dplyr` use.
>
> Write a query using `dplyr` functions that will return the average age (`numage`) by sex for all records where
> the response for Q2 is missing (missing values are indicated by a value of -1).
>
> > ## Solution
> >
> > ```{r}
> > SN7577_d <- tbl(mydb_dplyr, sql("SELECT * FROM SN7577"))
> >
> > SN7577_d %>%
> > filter(Q2 == -1) %>%
> > group_by(sex) %>%
> > summarize(avg_age = mean(numage))
> > ```
> >
> {: .solution}
{: .challenge}
::::::::::::::::::::::::::::::::::::::::: callout

## Exercise

Store the SN7577 table as an object for `dplyr` use.

Write a query using `dplyr` functions that will return the average age (`numage`) by sex for all records where
the response for Q2 is missing (missing values are indicated by a value of -1).

::::::::::::::: solution

## Solution

```{r}
SN7577_d <- tbl(mydb_dplyr, sql("SELECT * FROM SN7577"))
SN7577_d %>%
filter(Q2 == -1) %>%
group_by(sex) %>%
summarize(avg_age = mean(numage))
```

:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::

{% include links.md %}


12 changes: 4 additions & 8 deletions episodes/00-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ exercises: 15
source: Rmd
---

```{r, include=FALSE}
source("../bin/download_data.R")
```{r setup, include=FALSE}
source("data/download_data.R")
```

::::::::::::::::::::::::::::::::::::::: objectives
Expand Down Expand Up @@ -122,15 +122,15 @@ freely available to extend R's native capabilities.

<div class="col-md-6">

```{r rstudio-analogy, echo = FALSE, fig.show = "hold", out.width = "100%", fig.alt = "RStudio extends what R can do, and makes it easier to write R code and interact with R."}
```{r rstudio-analogy, echo=FALSE, fig.show="hold", out.width="100%", fig.alt="RStudio extends what R can do, and makes it easier to write R code and interact with R."}
knitr::include_graphics("fig/r-manual.jpeg")
```

</div>

<div class="col-md-6">

```{r rstudio-analogy-2, echo = FALSE, fig.show = "hold", fig.alt = "automatic car gear shift representing the ease of RStudio", out.width="100%"}
```{r rstudio-analogy-2, echo=FALSE, fig.show="hold", fig.alt="automatic car gear shift representing the ease of RStudio", out.width="100%"}
knitr::include_graphics("fig/r-automatic.jpeg")
```

Expand Down Expand Up @@ -407,8 +407,6 @@ proceeds, messages relating to its progress will be written to the console.
You will be able to see all of the packages which are actually being
installed.



:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::
Expand All @@ -432,8 +430,6 @@ was written to the console before the start of the installation messages.

You could also have installed the **`tidyverse`** packages by running this command directly at the R terminal.



:::::::::::::::::::::::::::::::::::::::: keypoints

- Use RStudio to write and run R programs.
Expand Down
21 changes: 6 additions & 15 deletions episodes/01-intro-to-r.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ exercises: 30
source: Rmd
---

```{r, include=FALSE}
source("../bin/download_data.R")
```{r setup, include=FALSE}
source("data/download_data.R")
```

::::::::::::::::::::::::::::::::::::::: objectives
Expand Down Expand Up @@ -104,7 +104,6 @@ have drastically different meanings. However, in this lesson, the two words
are used synonymously. For more information see:
[https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Objects](https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Objects)


::::::::::::::::::::::::::::::::::::::::::::::::::

When assigning a value to an object, R does not print anything. You
Expand Down Expand Up @@ -160,8 +159,6 @@ The value of `area_acres` is still 6.175 because you have not
re-run the line `area_acres <- 2.47 * area_hectares` since
changing the value of `area_hectares`.



:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::
Expand Down Expand Up @@ -306,7 +303,6 @@ Type in `?round` at the console and then look at the output in the Help pane.
What other functions exist that are similar to `round`?
How do you use the `digits` parameter in the round function?


::::::::::::::::::::::::::::::::::::::::::::::::::

## Vectors and data types
Expand Down Expand Up @@ -414,7 +410,6 @@ single vector?

R implicitly converts them to all be the same type.


:::::::::::::::::::::::::

What will happen in each of these examples? (hint: use `class()`
Expand All @@ -437,7 +432,6 @@ Vectors can be of only one data type. R tries to
convert (coerce) the content of this vector to find a "common
denominator" that doesn't lose any information.


:::::::::::::::::::::::::

How many values in `combined_logical` are `"TRUE"` (as a character) in the
Expand All @@ -460,7 +454,6 @@ first time the vector is evaluated. Therefore, the `TRUE` in
gets converted into a `1` before it gets converted into `"1"` in
`combined_logical`.


:::::::::::::::::::::::::

You've probably noticed that objects of different types get
Expand Down Expand Up @@ -604,10 +597,10 @@ Recall that you can use the `typeof()` function to find the type of your atomic
## Exercise

1. Using this vector of rooms, create a new vector with the NAs removed.
```r
rooms <- c(1, 2, 1, 1, NA, 3, 1, 3, 2, 1, 1, 8, 3, 1, NA, 1)
```

```r
rooms <- c(1, 2, 1, 1, NA, 3, 1, 3, 2, 1, 1, 8, 3, 1, NA, 1)
```

2. Use the function `median()` to calculate the median of the `rooms` vector.

Expand Down Expand Up @@ -637,8 +630,6 @@ Now that we have learned how to write scripts, and the basics of R's data
structures, we are ready to start working with the SAFI dataset we have been
using in the other lessons, and learn about data frames.



:::::::::::::::::::::::::::::::::::::::: keypoints

- Access individual values by location using `[]`.
Expand Down
20 changes: 8 additions & 12 deletions episodes/02-starting-with-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ exercises: 30
source: Rmd
---

```{r, include=FALSE}
source("../bin/download_data.R")
```{r setup, include=FALSE}
source("data/download_data.R")
```

::::::::::::::::::::::::::::::::::::::: objectives
Expand Down Expand Up @@ -181,7 +181,6 @@ the help for `read_csv()` by typing `?read_csv` to learn more. There is also
the `read_tsv()` for tab-separated data files, and `read_delim()` allows you
to specify more details about the structure of your file.


::::::::::::::::::::::::::::::::::::::::::::::::::

Note that `read_csv()` actually loads the data as a tibble.
Expand Down Expand Up @@ -259,7 +258,6 @@ Different ways of specifying these coordinates can lead to results with
different classes. This is covered in the Software Carpentry lesson
[R for Reproducible Scientific Analysis](https://swcarpentry.github.io/r-novice-gapminder/).


::::::::::::::::::::::::::::::::::::::::::::::::::

```{r, purl=FALSE}
Expand Down Expand Up @@ -320,12 +318,12 @@ names of the columns.
row 100 of the `interviews` dataset.

2. Notice how `nrow()` gave you the number of rows in the tibble?
- Use that number to pull out just that last row in the tibble.
- Compare that with what you see as the last row using `tail()` to make
sure it's meeting expectations.
- Pull out that last row using `nrow()` instead of the row number.
- Create a new tibble (`interviews_last`) from that last row.

- Use that number to pull out just that last row in the tibble.
- Compare that with what you see as the last row using `tail()` to make
sure it's meeting expectations.
- Pull out that last row using `nrow()` instead of the row number.
- Create a new tibble (`interviews_last`) from that last row.

3. Using the number of rows in the interviews dataset that you found in
question 2, extract the row that is in the middle of the dataset. Store
Expand Down Expand Up @@ -655,8 +653,6 @@ variables to date.
mdy(char_dates)
```



:::::::::::::::::::::::::::::::::::::::: keypoints

- Use read\_csv to read tabular data in R.
Expand Down
8 changes: 2 additions & 6 deletions episodes/03-dplyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ exercises: 15
source: Rmd
---

```{r, include=FALSE}
source("../bin/download_data.R")
```{r setup, include=FALSE}
source("data/download_data.R")
```

::::::::::::::::::::::::::::::::::::::: objectives
Expand Down Expand Up @@ -46,7 +46,6 @@ variants of different function and option names. For this lesson, we utilize
the American spellings of different functions; however, feel free to use
the regional variant for where you are teaching.


::::::::::::::::::::::::::::::::::::::::::::::::::

## What is an R package?
Expand Down Expand Up @@ -79,7 +78,6 @@ the package [`data.table`](https://rdatatable.gitlab.io/data.table/). See this
for example to get a sense of the differences between using `base`, `tidyverse`, and
`data.table`.


::::::::::::::::::::::::::::::::::::::::::::::::::

## Learning **`dplyr`**
Expand Down Expand Up @@ -504,8 +502,6 @@ interviews %>%

::::::::::::::::::::::::::::::::::::::::::::::::::



:::::::::::::::::::::::::::::::::::::::: keypoints

- Use the `dplyr` package to manipulate dataframes.
Expand Down
6 changes: 2 additions & 4 deletions episodes/04-tidyr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ exercises: 15
source: Rmd
---

```{r, include=FALSE}
source("../bin/download_data.R")
```{r setup, include=FALSE}
source("data/download_data.R")
```

::::::::::::::::::::::::::::::::::::::: objectives
Expand Down Expand Up @@ -485,8 +485,6 @@ if (!dir.exists("data_output")) dir.create("data_output")
write_csv(interviews_plotting, "data_output/interviews_plotting.csv")
```



:::::::::::::::::::::::::::::::::::::::: keypoints

- Use the `tidyr` package to change the layout of dataframes.
Expand Down
Loading

0 comments on commit e5e2191

Please sign in to comment.