Schedule
- Readings in normal font should be completed and annotated ahead of lecture.
- Readings in italic provide optional additional depth on the material.
- Assignments are listed on the day when I suggest you begin working on them.
Reading sources:
- PSC: Lecture notes I’ve written for this course, hosted here.
- PDSH: The Python Data Science Handbook by Vanderplas (2016).
- BHN: Fairness and Machine Learning: Limitations and Opportunities by Barocas, Hardt, and Narayanan (2023).
Week 1
Tue Feb. 13 |
Welcome! | ||||
We introduce our topic and discuss how the course works. | |||||
Learning Objectives Getting Oriented |
Reading Course syllabus Collaboration Why I Don't Grade by Jesse Stommel |
Notes Welcome slides Data, Patterns, and Models |
Warmup Set up your software. |
Assignments Math pre-assessment. |
|
Thu Feb. 15 |
The Classification Workflow in Python | ||||
We work through a simple, complete example of training and evaluating a classification model on a small data set. | |||||
Learning Objectives Navigation Experimentation |
Reading PDSH: Data Manipulation with Pandas (through "Aggregation and Grouping") |
Notes Lecture notes Live notes (Google Colab) |
Warmup Meet the Palmer Penguins! |
Assignments Blog Post: Penguins |
Week 2
Tue Feb. 20 |
Linear Score-Based Classification | ||||
We study a fundamental method for binary classification in which data points are assigned scores. Scores above a certain threshold are assigned to one class; scores below are assigned to another. | |||||
Learning Objectives Theory Experimentation |
Reading Linear Classifiers from MITx. |
Notes Lecture notes Live notes (Google Colab) |
Warmup Graphing Decision Boundaries |
Assignments ACTUAL REAL DUE DATE: Reflective Goal-Setting due 2/27 |
|
Thu Feb. 22 |
Statistical Decision Theory and Automated Decision-Making | ||||
We discuss the theory of making automated decisions based on a score function. We go into detail on thresholding, error rates, and cost-based optimization. | |||||
Learning Objectives Theory Experimentation |
Reading PDSH: Introduction to Numpy |
Notes Lecture notes Live notes (Google Colab) |
Warmup Choosing a Threshold |
Assignments Blog Post: Whose Costs? |
Week 3
Tue Feb. 27 |
Auditing Fairness | ||||
We introduce the topics of fairness and disparity in automated decision systems using a famous case study. | |||||
Learning Objectives Social Responsibility Experimentation |
Reading BHN: Introduction Machine Bias by Julia Angwin et al. for ProPublica. |
Notes Lecture notes Live notes (Google Colab) |
Warmup Experiencing (Un)Fairness |
Assignments Reflective Goal-Setting due today |
|
Thu Feb. 29 |
Statistical Definitions of Fairness in Automated Decision-Making | ||||
We offer formal mathematical definitions of several natural intuitions of fairness, review how to assess them empirically on data in Python, and prove that two major definitions are incompatible with each other. | |||||
Learning Objectives Social Responsibility Theory |
Reading BHN: Classification (ok to skip "Relationships between criteria" and below) |
Notes Lecture notes Live notes (Google Colab) |
Warmup Reading Check |
Assignments Blog Post: Bias Replication Study and/or Blog Post: Women in Data Science Conference |
Week 4
Tue Mar. 05 |
Normative Theory of Fairness | ||||
We discuss some of the broad philosophical and political positions that underly the theory of fairness, and connect these positions to statistical definitions. | |||||
Learning Objectives Social Responsibility |
Reading BHN: Relative Notions of Fairness |
Notes Discussion guide shared in Slack |
Warmup COMPAS and Equality of Opportunity |
||
Thu Mar. 07 |
Critical Perspectives: Interrogate Your Task | ||||
We discuss several critical views that seek to move our attention beyond the fairness of algorithms and towards their role in sociotechnical systems. We center two questions: who benefits from a given data science task? What tasks could we approach instead if our aims were to uplift the oppressed? | |||||
Learning Objectives Social Responsibility |
Reading Data Feminism: The Power Chapter by Catherine D'Ignazio and Lauren Klein "The Digital Poorhouse" by Virginia Eubanks "Studying Up: Reorienting the study of algorithmic fairness around issues of power" by Barabas et al. |
Notes Discussion guide shared in Slack |
Warmup Power, Data, and Studying Up |
Assignments Blog Post: Limitations of the Quantitative Approach |
Week 5
Tue Mar. 12 |
Critical Perspectives: Interrogate Your Data | ||||
We discuss the importance of understanding the context of data when planning and executing data science, and effectively communicating this context when sharing our findings. | |||||
Learning Objectives Social Responsibility |
Reading Data Feminism: The Numbers Don't Speak For Themselves by Catherine D'Ignazio and Lauren Klein Datasheets for Datasets by Timnit Gebru et al. |
Notes Discussion guide shared in Slack |
Warmup Data Context and Data Sheets |
||
Thu Mar. 14 |
Introduction to Model Training: The Perceptron | ||||
We study the perceptron as an example of a linear model with a training algorithm. Our understanding of this algorithm and its shortcomings will form the foundation of our future explorations in empirical risk minimization. | |||||
Learning Objectives Theory |
Reading No reading today, but please be ready to put some extra time into the warmup. It may be useful to review our lecture notes on score-based classification and decision theory when completing the warmup. |
Notes Lecture notes Live notes (Google Colab) |
Warmup Linear Models, Perceptron, and Torch |
Assignments Blog Post: Implementing Perceptron |
Week 6
Tue Mar. 19 |
Spring Break! | ||||
Warmup TBD |
|||||
Thu Mar. 21 |
Spring Break! | ||||
Warmup TBD |
Week 7
Tue Mar. 26 |
Convex Empirical Risk Minimization | ||||
We introduce the framework of convex empirical risk minimization, which offers a principled approach to overcoming the many limitations of the perceptron algorithm. | |||||
Learning Objectives Theory |
Reading Convexity Examples by Stephen D. Boyles, pages 1 - 7 (ok to stop when we start talking about gradients and Hessians). |
Notes Lecture notes Live notes (Google Colab) |
Warmup Practice with Convex Functions |
Assignments ACTUAL REAL DUE DATE: Mid-semester reflection due 04/02 |
|
Thu Mar. 28 |
Gradient Descent | ||||
We study a method for finding the minima of convex functions using techniques from calculus and linear algebra. | |||||
Learning Objectives Theory |
Reading No reading today, but please budget some extra time for the warmup. |
Notes Lecture notes Live notes (Google Colab) |
Warmup A First Look at Gradient Descent |
Assignments Blog Post: Implementing Logistic Regression |
Week 8
Tue Apr. 02 |
Feature Maps and Regularization | ||||
We re-introduce feature maps as a method for learning nonlinear decision boundaries, and add regularization to the empirical risk minimization problem in order to control the complexity of our learned models. | |||||
Learning Objectives Theory Experimentation |
Reading No reading today -- please think hard about your project pitches! |
Notes Lecture notes Live notes (Google Colab) |
Warmup Project Pitches |
Assignments Mid-semester reflection due today ,ACTUAL REAL DUE DATE: Project proposal due 4/9 |
|
Thu Apr. 04 |
Linear Regression | ||||
We introduce linear regression through the framework of convex empirical risk minimization. | |||||
Learning Objectives Theory |
Reading No additional reading, but you may need to open up your linear algebra textbook in order to complete the warmup. |
Notes Linear Regression Live notes (Google Colab) |
Warmup Eigenvalues and Linear Systems |
Week 9
Tue Apr. 09 |
Vectorization and Feature Engineering | ||||
We illustrate the interplay of vectorization and feature engineering on image data. | |||||
Learning Objectives Experimentation Implementation |
Reading Image Kernels Explained Visually by Victor Powell |
Notes Vectorization and Feature Engineering Live notes (Google Colab) |
Warmup Kernel Convolution |
Assignments Blog Post: Sparse Kernel Machines |
|
Thu Apr. 11 |
Kernel Methods | ||||
We introduce kernel methods as an alternative approach to the problem of fitting nonlinear models to data. | |||||
Learning Objectives Theory |
Reading Classification and K-Nearest Neighbours by Hiroshi Shimodaira for a course at the University of Edinburgh |
Notes Kernel Methods Live notes (Google Colab) |
Warmup Introducing Kernel Regression |
Week 10
Tue Apr. 16 |
The Problem of Features and Deep Learning | ||||
We motivate deep learning as an approach to the problem of learning complex nonlinear features in data. | |||||
Learning Objectives Theory Implementation |
Reading |
Notes The Problem of Features and Deep Learning Live notes (Google Colab) |
Warmup Project Check In |
||
Thu Apr. 18 |
Modern Optimization | ||||
We briefly introduce two concepts in optimization that have enabled large-scale deep learning: stochastic first-order optimization techniques and automatic differentiation. | |||||
Learning Objectives Theory |
Reading |
Notes Modern Optimization for Deep Learning Live notes (Google Colab) |
Warmup Introducing Stochastic Gradient Descent |
Assignments Blog Post: The Adam Algorithm for Optimization |
Week 11
Tue Apr. 23 |
Deep Image Classification | ||||
We return to the image classification problem, using deep learning and large-scale optimization to optimize convolutional kernels as part of the training process. | |||||
Learning Objectives Implementation |
Reading Convolutional Neural Networks from MIT's course 6.036: Introduction to Machine Learning |
Notes Deep Image Classification Live notes (Google Colab) |
Warmup Project Check In |
||
Thu Apr. 25 |
Deeper Image Classification | ||||
We continue working on an extended practical case study of deep learning for image classification. | |||||
Learning Objectives Implementation |
Reading Convolutional Neural Networks from MIT's course 6.036: Introduction to Machine Learning |
Notes Deep Image Classification Live notes (Google Colab) |
Warmup What Needs to Be Learned? |
Week 12
Tue Apr. 30 |
Text Classification | ||||
We briefly study the use of word embeddings for text classification. | |||||
Learning Objectives Theory Implementation Experimentation Social Responsibility |
Reading Efficient Estimation of Word Representations in Vector Space by Mikolov et al. (sections 1, 4, 5) |
Notes Text Classification and Word Embedding Live notes (Google Colab) |
Warmup Project Check In |
Assignments Blog Post: Deep Music Genre Classification |
|
Thu May. 02 |
Unsupervised Learning and Autoencoders | ||||
We introduce unsupervised learning through the framework of autoencoders. | |||||
Learning Objectives Theory Implementation |
Reading K-Means Clustering from PDSH. |
Notes Unsupervised Learning and Autoencoders Live notes (Google Colab) |
Warmup Compression factor of K-means |
Assignments ACTUAL REAL DUE DATE: End-Of-Course Reflection due 5/15 |
Week 13
Tue May. 07 |
Neural Autoencoders | ||||
We use neural autoencoders to learn low-dimensional structure in more complex data sets. | |||||
Learning Objectives Theory Implementation |
Notes Neural Autoencoders Live notes (Google Colab) |
Warmup Image Embedding By Hand |
|||
Thu May. 09 |
Project Presentations | ||||
Your project presentation is a group presentation of no more than 4 minutes long, in which you'll present your accomplishments in your project to the class. | |||||
No matching items
Wrapping Up
The final day to submit all course assignments, including blog posts, project assignments, and final reflections, is May 15th at 11:59pm.
During the final exam period, you’ll meet with me 1-1 for about 15 minutes. The purpose of this meeting is to help us both reflect on your time in the course and agree on a final grade.
© Phil Chodrow, 2024
References
Barocas, Solon, Moritz Hardt, and Arvind Narayanan. 2023. Fairness and Machine Learning: Limitations and Opportunities. Cambridge, Massachusetts: The MIT Press.
Vanderplas, Jacob T. 2016. Python Data Science Handbook: Essential Tools for Working with Data. First edition. Sebastopol, CA: O’Reilly Media, Inc.