| layout | default |
|---|---|
| title | Home |
| nav_order | 1 |
Hello everyone! This website will serve as the home base for our course. Here you'll find the syllabus, schedule, assignments, and any updates throughout the semester. By definition, "data science" must make meaning out of ever-growing pools of data. But the researcher quickly discovers that the hand examination of any data, while useful for granular analysis, is never adequate for large samples. To produce data science at scale, researchers must make effective use of workflows, pipelines, and processes to ingest, parse, and transform data with tools and automation.
This course will center on exposing students to contemporary pipelines for data analysis through a series of steadily escalating use cases. The course will begin with simple local database construction and evolve to cloud-based infrastructure such as AWS or Google Cloud. This progression will include learning a variety of systems for data collection, orchestration, transformation, consumption, and others as appropriate. We will be exposed to concepts of data wrangling, cleansing, ETL and some application of Machine Learning, though the focus of this course isn’t to teach that.