forked from UofT-DSI/r
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathassignment_2.qmd
More file actions
103 lines (70 loc) · 7.5 KB
/
assignment_2.qmd
File metadata and controls
103 lines (70 loc) · 7.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: "Assignment #2"
format: html
---
**Objective:** In this assignment, your task is to use R to analyze library usage data.
## Submission Information
🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.
### Submission Parameters:
- Submission Due Date: \`FILL IN\`
- The branch name for your repo should be: `assignment-2`
- What to submit for this assignment:
- This Quarto document (assignment_1.qmd) should be populated and should be the only change in your pull request.
- What the pull request link should look like for this assignment: `https://github.com/<your_github_username>/r/pull/<pr_id>`
- Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.
### Checklist:
- Create a branch called `assignment-2`.
- Ensure that your repository is public.
- Review [the PR description guidelines](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md#guidelines-for-pull-request-descriptions) and adhere to them.
- Verify that your link is accessible in a private browser window.
### Grading criteria:
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| Criteria | Pass | Fail |
+======================================+========================================================================================================================+==================================================================================+
| **General criteria** | | |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| Code execution | All code cells execute without errors. | Any code cell produces an error upon execution. |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| Code quality | Code is well-organized, concise, and includes necessary comments for clarity. | Code is unorganized, verbose, or lacks necessary comments. |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| **Specific criteria** | | |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| 1. Reading in data | Data is correctly read in to R. | Data is not correctly read in to R. |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| 2. Wrangling data | Code manipulates data as required. A single dataframe containing all the data for physical locations only is produced. | Resulting dataframe(s) are incomplete or do not follow the specified format. |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| 3. The largest and busiest branches | The top 5 library branches for each question are identified and printed clearly. | The required library branches identified are not correct or not printed clearly. |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
| 4. Library usage over the years | The plot clearly communicates the number of workstation sessions at the two library branches. | The plot shows does not correctly and clearly show the desired information. |
+--------------------------------------+------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+
## Part 1: Reading in data
There are two data files you need for this assignment:
- `tpl_branch_general_info_2023.csv`
- `tpl_branch_workstation_usage_2018-2023.csv`
They are both located under `r/05_src/assignment-2-data`. Write R code to read these files.
```{r}
branch_info <- # Your code here
branch_workstation_usage <- # Your code here
```
## Part 2: Wrangling data
Before we are able to draw conclusions from the data, make the following modifications to the data:
- The branch general information dataset contains virtual library services. **Drop these rows and keep only physical library locations.**
- **Join the two datasets** so that all the information is in a single dataframe.
```{r}
# Your code below
```
## Part 3: The largest and busiest branches
Which library branches...
1. ... the most workstations available?
2. ... the highest numbers of workstation usage sessions?
3. ... the most usage sessions per workstation?
Write R code to show the names of the top 5 library branches for each question. Print only the names of the branches.
**Hint**: `head(my_data, 5)` will return the first 5 rows of the dataframe `my_data`.
```{r}
# Your code below
```
## Part 4: Library usage over the years
How has library workstation usage changed between 2018-2023? Use ggplot to chart the number of workstation sessions annually at the Toronto Reference Library and North York Central Library branches during these years.
```{r}
# Your code below
```