BIOL 548O is a short module designed to help students work with datasets more effectively and efficiently.
Dr. Matthew Pennell
Assistant Professor, Department of Zoology
Email: [email protected]
Office hours: By appointment
Office: Biodiversity 208
The module will run from February 4th to March 5th 2020. Note that this differs slightly from the dates in the UBC Course calendar. There will be no classes during Reading Week.
Classes will be in Biosciences 4223 on Tuesdays and Thursdays from 1500-1630.
80% Homework assignments
20% In-class participation
This module is focused on skill development. I recognize that different students will be coming from very different academic backgrounds have various levels of experience with the tools we are working with. And that's great -- we are all here to learn! As such, assessment for this module will be primary about the process (are you putting effort into developing your skillset?) and not the product (how elegant is your code?).
This module is designed to be primarily a "workshop"-style course. I will expect you to have read the assigned materials beforehand. During class, I will review some key points and we'll work through problems together.
It is not necessary to bring your own datasets to work with; I know that many of you might be just starting your studies or otherwise, do not currently have datasets that are in need of cleaning up. However, if you already have data, either from your own thesis work or perhaps some other lab project, please bring it along -- it is far more motivating and interesting to work with data you really care about.
Note: Much of the course material is adapted from the Data Carpentry for Biologists course developed by Ethan White and Zachary Brym.
In the first lecture, we are going to:
-
Run through the objectives of the module so you can get a sense of where we are going;
-
Discuss the data management challenges that you face (or will likely face) when working with data specific to your research topic;
-
take a brief tour of RStudio + git/GitHub and learn how we can make them talk to one another.
In preparation for the first lecture, I would ask you to please do the following:
-
Download and install the R base system and the RStudio Desktop IDE. Both are needed. Note that installing RStudio will not automatically install R;
-
Download and install git;
-
If you haven't already, set up an account on GitHub and send your username to the Instructor ([email protected]).
Topic: Best practices for version control and project organization
Readings:
Git Basics in RStudio
Assignment:
Complete Exercises 1-4 in the Lecture.
Additional Readings:
Happy git with R (ebook) by UBC Stat 545 instructors
Topic: Principles of tidy data
Readings:
R for Data Science - Tidy Data
Additional Readings:
Data organization in Spreadsheets (general paper)
Data organization in Spreadsheets (for Ecologists)
Topic: Transforming Data in R
Readings:
R for Data Science - Transforming Data
Additional readings:
Data Carpentry: dplyr
Topic: Relational databases
Readings:
R for Data Science - Relational Data
Additional readings:
Data Carpentry - Working with SQL databases in R
Topic: Working with text (or, why regular expressions are your best friends)
Readings:
R for Data Science - Strings
Topic: Scripting - Part I - Functions
Readings:
R for Data Science - Functions
Topic: Scripting - Part II - Conditionals and Iteration
Readings:
R for Data Science - Iteration
I regularly refer to all 3 of the following books and cannot recommend them strongly enough.
-
Hadley Wickham Advanced R (ebook)
-
Garrett Grolemund and Hadley Wickham R for Data Science (ebook)
-
Stephen Haddock and Casey Dunn. Practical Computing for Biologists