Spring 2021: Tues/Thurs 9.30-10.45
Instructor: Adam Millard-Ball, he/him
Office hours: Mondays 2.30-4.30. Sign up here
About this course: New data sources are a potential goldmine for urban planners and policy makers. But sometimes they are large, sometimes they are messy, sometimes they are awkward to access, and often they are all of these things. In this hands-on course, we’ll develop skills in scraping, processing, and managing urban data, and using tools such as natural language processing, geospatial analysis, and machine learning. We’ll use examples from transit, housing, and equity planning, and build competence in open-source tools and languages such as Python and SQL. We’ll also consider the limits to data science, and the biases and pitfalls that "big data" can entail.
Prerequisites: Basic Python programming experience, for example through the 2020-21 version of 206A (Introduction to Geographic Information Systems and Spatial Data Science), or an introductory Python course. One good, free option is offered by Data Carpentry. Another is the University of Michigan Introduction to Data Science in Python course; if you have no prior knowledge, you should take Programming for Everyone first. (You can take these for free if you choose the "audit" option.) Whichever option you choose, before starting this course you should be familiar with Python syntax, Jupyter notebooks, plotting via matplotlib, and pandas and geopandas dataframes.
- Expand your urban data analysis, visualization, and Python skills, regardless of your starting point
- Identify applications of these techniques to urban planning challenges
- Know how to read API documentation and where to get more information
- Understand how to collaborate using git and other software tools
- Critically analyze the constraints to data science methods, particularly in terms of ethics and causal inference
All the readings, assignments, and lecture notebooks will be posted on this GitHub site.
I ask you to do the assignments using a Jupyter Notebook in Anconda. That helps make sure that we all have a consistent Python setup across different computers (Mac, Windows, etc.). If you are unable to install Anaconda or are uncomfortable doing so, please talk to me about alternatives before the quarter starts.
Each class will be held synchronously via Zoom. Login details are on CCLE. Class sessions will be recorded and posted to CCLE.
We will primarily use Slack to maintain communication outside of scheduled class times. In particular, it provides a space for problem solving and for you to help each other. You’ll receive an email invitation from me to join our Slack workspace. If you have not used Slack before, please familiarize yourself with the tool using the resources here.
There is no textbook for the class. All required readings will be posted here on GitHub. However, you may find some of the following books helpful.
- Python for Data Analysis, 2nd edition. This focuses on the
pandaslibrary. The UCLA Library has a copy. - Think Python, 2nd edition. This is a free ebook and is an excellent general refresher on Python syntax and data structures.
Your active participation is essential to making this course successful and enjoyable. I expect you to:
- Actively follow the examples in class through your own copy of the Jupyter notebook posted in advance. Tweak and experiment with the examples. If you don't follow an example, let me know in the Zoom chat—others will undoubtedly have the same question.
- Discuss the substantive readings on Slack. Post a question, idea, or comment by 8.30am on the day of class. Please engage with the posts of others as well as writing your own—this is a discussion board, not a repository of essays.
- Use Slack to help each other out with questions on the assignments and projects.
Weekly homeworks (25%). Each homework will be posted on Monday and due at 5pm on the following Monday. You must submit at least 4 out of 5 homeworks on time (but please do them all). We'll spend time in class on Thursday working through the assignment, so please make a start on it before then.
Challenge problems (25%). Most homeworks will include a challenge problem, which is more open ended. You must do at least 2 of these, and present one in class.
Final project (35%). Working in groups of 2-3, you'll conceptualize and implement an urban data science project. You'll submit a proposal (Week 3), and make lightening presentations of your interim (Weeks 6-7) and final analysis (Weeks 9-10).
Class participation (15%). Your class participation grade will consider attendance and active participation in class and on Slack.
If you require any accommodations because of a disability, please talk to me within the first two weeks of the quarter if possible. The sooner that I am aware of any accessibility needs, the quicker I can try and accommodate them.
Students can make a formal request to the instructor for special consideration for an extension to an assignment due date. This request should be received at least 48 hours in advance. Otherwise, one partial grade will be deducted for every 24-hour period an assignment is late. For example, an A- will go to a B+. Note that no extensions are possible for the weekly homeworks (because I post the solutions promptly)—that's why only 7 out of 9 homeworks are required.
UCLA’s policy about plagiarism is clear: the sources of all ideas, text, pictures, or graphics that are not your (or your team’s) own must be fully cited, all passages copied from other sources must be in quotation marks with the source cited, and you absolutely cannot submit materials that have previously been submitted by other students in previous iterations of this course, even if you have re-worked this material for your submission. Being in this class constitutes an acknowledgment and willingness to abide by UCLA’s academic integrity policies.1 Should you have any questions about these policies, see here. (Thanks to Prof. Mike Manville for permission to use this text.)
The same principles of academic integrity apply to your code. If you borrow any code snippets or ideas, acknoweldge them with a comment in the code. E.g.
# Iteration code from: https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
The schedule is preliminary and subject to change, depending on how quickly or slowly we move through the material. In general, class on Tuesdays will cover the theory and worked examples, and discuss applications from the readings. Class on Thursdays will be more hands-on, and devote time to working through the weekly homework assignment.
A Jupyter notebook will be posted on GitHub. The course readings are all posted on CCLE.
Pre-course:
- Please fill out the pre-course survey
- Create a GitHub account if you don't have one
- Download and install Anaconda
- Homework due 9.30am, Apr 8
- How to submit your homework on GitHub
- Homework, due 5pm, Apr 19
- Homework, due 5pm, May 3
Week 5: Natural language processing (1): parsing
- Guest: Zachary Steinert-Threlkeld, Department of Public Policy, UCLA (April 27)
Week 6: Natural language processing (2): topic modeling and sentiment analysis
- Guest: Claudia Preciado, Remix (May 4)
- Homework, due 5pm, May 17
Week 7: Machine learning (1): supervised algorithms
Week 8: Machine learning (2): unsupervised algorithms
- Homework, due 5pm, June 2
Week 10: Big data, privacy, and ethics
- Makeup homework, due 9am, June 15