Schedule
- Readings in normal font should be completed and annotated ahead of lecture.
- Readings in italic provide optional additional depth on the material.
- Assignments are listed on the day when I suggest you begin working on them.
Week 1
M Feb. 13 |
|||||
We discuss how the course works and begin our discussion of classification and auditing. | |||||
Learning Objectives Getting Oriented |
Reading Course syllabus Collaboration Why I Don't Grade by Jesse Stommel Daumé 1.1-1.5 |
Notes Welcome slides |
Warmup Set up your software. |
Assignments No really, set up your software. |
|
W Feb. 15 |
Classification: The Perceptron | ||||
We study the perceptron algorithm, a historical method that serves as the foundation for many modern classifiers. | |||||
Learning Objectives Theory Implementation |
Reading Daumé 4.1-4.5, 4.7 Introduction to Numpy from The Python Data Science Handbook by Jake VanderPlas Linear algebra with Numpy Hardt and Recht, p. 33-41 (if you need to see a definition of a function gradient, see Daumé p. 93) |
Notes Lecture notes |
Warmup Perceptron |
Assignments Blog post: perceptron |
Week 2
M Feb. 20 |
Convex Linear Models and Logistic Regression | ||||
We discuss the modeling choices necessary to make the empirical risk minimization problem for linear classifiers tractable. In doing so we discuss convex functions and some of their properties that are relevant for optimization. Finally, we introduce logistic regression as an example of a convex linear classifier. | |||||
Learning Objectives Theory Implementation |
Reading Daumé 2.1-2.7 Daumé 7.1-7.3 Hardt and Recht, p. 70-77 |
Notes Lecture notes |
Warmup Convexity |
||
W Feb. 22 |
Optimization via Gradient Descent | ||||
We discuss standard mathematical methods for empirical risk minimization, including gradient descent and stochastic gradient descent. We also recontextualize the perceptron algorithm as stochastic subgradient descent for a linear classifier with a specific loss function. | |||||
Learning Objectives Theory Implementation |
Reading Daumé 7.4-7.6 Diesenroth, Faisal, and Soon, p. 225-233 |
Notes Lecture notes |
Warmup Gradient Descent |
Assignments Blog post: gradient descent |
Week 3
M Feb. 27 |
Features, Regularization, and Nonlinear Decision Boundaries | ||||
We learn how to use feature maps to help our convex linear classifiers learn nonlinear patterns. We also introduce the problem of overfitting and introduce feature selection and regularization as methods for addressing this problem. | |||||
Learning Objectives Theory Implementation Navigation Experimentation |
Reading Introducing Scikit-Learn Hyperparameters and Model Validation Feature Engineering |
Notes Lecture notes Live version |
Warmup Gradient Descent Again |
Assignments ACTUAL REAL DUE DATE: Reflective Goal-Setting due 2/27 |
|
W Mar. 01 |
Classification in Practice | ||||
We work through a complete modeling workflow for the Titanic survival data set. Along the way, we work with data frames and discuss cross-validation. | |||||
Learning Objectives Navigation Experimentation |
Reading Daumé Chapter 2 You may find it useful to review Chapter 1 as well. Data Manipulation with Pandas (Focus on the sections up to and including "Aggregation and Grouping") |
Notes Lecture notes Live version |
Warmup Overfitting and the Scientific Method |
Assignments Blog post: kernel logistic regression OR Blog post: penguins |
Week 4
M Mar. 06 |
Beyond Convex Linear Classifiers | ||||
We discuss several examples of other classifiers at a high level, including some that are nonlinear or nonconvex. | |||||
Learning Objectives Navigation |
Reading NA |
Notes Lecture notes Live version |
|||
W Mar. 08 |
Linear Regression | ||||
We introduce linear regression, another convex linear model suitable for predicting real numbers instead of class labels. | |||||
Learning Objectives Theory Implementation |
Reading NA |
Notes Lecture notes Live version |
Assignments Blog post: Linear regression |
Week 5
M Mar. 13 |
Introduction to Bias and Fairness | ||||
TBD | |||||
Learning Objectives Social Responsibility Experimentation |
Reading Machine Bias by Julia Angwin et al. for ProPublica. Fair prediction with disparate impact by Alexandra Chouldechova, Sections 1 and 2. Inherent trade-offs in the fair determination of risk scores by Jon Kleinberg et al, pages 1-5. |
Notes Lecture notes Live version |
Warmup Balancing Classification Rates |
||
W Mar. 15 |
Critical Perspectives | ||||
We discuss limitations of the quantitative approach to studying discrimination, as well as critical perspectives on the role that automated decision systems play in surveilling and controlling marginalized individuals. | |||||
Learning Objectives Social Responsibility Experimentation |
Reading The Limits of the Quantitative Approach to Discrimination, speech by Arvind Narayanan "The Digital Poorhouse" by Virginia Eubanks for Harper's Magazine |
Notes TBD |
Warmup Limits of the Quantitative Approach |
Assignments Blog post: Limits of quantitative methods OR Blog post: Auditing allocative bias |
Week 6
M Mar. 27 |
Vectorization | ||||
We discuss some ways by which complex objects like images and especially text can be represented as numerical vectors for machine learning algorithms. | |||||
Learning Objectives Navigation Experimentation |
Reading Murphy, Chapter 1. This is not related to vectorization; it's for you to get oriented on some possible project ideas. Don't worry about any math you don't understand. Course project description |
Notes Lecture notes Live version |
Warmup Pitch a Project Idea |
Assignments ACTUAL REAL DUE DATE: Mid-semester reflection due 4/05 |
|
W Mar. 29 |
Introducing Unsupervised Learning: Topic Modeling | ||||
We begin to discuss unsupervised learning, with topic modeling as our initial example. | |||||
Learning Objectives Theory Navigation Experimentation |
Reading Principal Component Analysis from the Python Data Science Handbook |
Notes Lecture notes Live version |
Warmup Vectorization Brainstorm |
Assignments ACTUAL REAL DUE DATE: Project Proposal due 4/07 |
Week 7
M Apr. 03 |
Clustering Data | ||||
We continue our discussion of unsupervised learning with two methods for clustering sets of data. | |||||
Learning Objectives Theory Navigation Experimentation |
Reading K-Means Clustering from the Python Data Science Handbook |
Notes Lecture notes Live version |
Warmup K-Means Compression |
Assignments Blog post: Unsupervised learning with linear algebra (however, using this time to complete a previous blog post is also highly recommended) |
|
W Apr. 05 |
Introducing Deep Learning | ||||
We begin our discussion of deep learning with a quick theoretical motivation and a first glance at the PyTorch package. | |||||
Learning Objectives Theory Navigation |
Reading Lecture 1, Introduction from Chinmay Hegde's course on deep learning at NYU |
Notes Lecture notes Live version |
Warmup Introducing Tensors |
Week 8
M Apr. 10 |
Optimization For Deep Learning | ||||
We begin a discussion of the training process for neural networks, which requires efficient computation of gradients via backpropagation and efficient variations of gradient descent. | |||||
Learning Objectives Theory Implementation |
Reading Lecture 2, Neural Nets from Chinmay Hegde's course on deep learning at NYU |
Notes Lecture notes Live version |
Warmup Efficient Differentiation |
Assignments Blog post: Optimization with Adam (however, using this time to complete a previous blog post is also highly recommended) |
|
W Apr. 12 |
Convolutional Neural Networks | ||||
We discuss methods for image classification using neural networks and introduce convolutional layers. | |||||
Learning Objectives Theory Experimentation |
Reading Convolutional Neural Networks from MIT's course 6.036. A Comprehensive Guide to Convolutional Neural Networks by Sumit Saha on Towards Data Science has some good visuals. |
Notes Lecture notes Live version |
Warmup Convolutional Kernels |
Assignments ACTUAL REAL DUE DATE: Engaging with Timnit Gebru Part 1 due 4/19 |
Week 9
M Apr. 17 |
More on Image Classification | ||||
We continue our discussion of image classification with convolutional neural networks. | |||||
Learning Objectives Experimentation Navigation |
Reading Convolutional Neural Networks from MIT's course 6.036. A Comprehensive Guide to Convolutional Neural Networks by Sumit Saha on Towards Data Science has some good visuals. |
Notes Lecture notes Live version |
Warmup Project Check-In |
||
W Apr. 19 |
Some Practical Techniques in Image Classification | ||||
We discuss data augmentation and transfer learning, two helpful techniques in image classification. We also highlight some of messy challenges involving management of complex data for classification tasks with PyTorch. | |||||
Learning Objectives Theory Experimentation |
Notes Lecture notes Live version |
Warmup How Much Needs To Be Learned? |
Week 10
M Apr. 24 |
Dr. Timnit Gebru on Computer Vision and "Artificial General Intelligence" | ||||
We speak with Dr. Timnit Gebru about her recent work on computer vision and ideology in artificial general intelligence. | |||||
Learning Objectives Social Responsibility |
Warmup Project Check-In |
||||
W Apr. 26 |
Text Classification and Word Embeddings | ||||
We begin our study of text classification and the use of word embeddings for efficient text vectorization. | |||||
Learning Objectives Theory Experimentation |
Reading Efficient Estimation of Word Representations in Vector Space by Mikolov et al. (sections 1, 4, 5) |
Notes Lecture notes Live version |
Warmup Word embedding |
Assignments Blog post: deep music classification |
Week 11
M May. 01 |
Word Embeddings | ||||
We continue our study of text classification by training a classifier and examining word embeddings. | |||||
Learning Objectives Theory Experimentation Social Responsibility |
Reading Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Can you explain how orthogonal projections can help reduce bias in word embeddings? Fair Is Better than Sensational: Man Is to Doctor as Woman Is to Doctor |
Notes Lecture notes Live version |
Warmup Project Check-In |
||
W May. 03 |
Text Generation and Recurrent Neural Networks | ||||
We use recurrent neural networks to generate synthetic text with several realistic attributes. | |||||
Learning Objectives Theory Implementation Navigation |
Reading The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy |
Notes Lecture notes Live version |
Warmup "Realistic" text |
Week 12
M May. 08 |
Reflection and Feedback | ||||
We look back on our time in the course, reflect on the responsibilities of data scientists in society, and give feedback on the course. | |||||
Learning Objectives Theory Social Responsibility |
Reading Millions of black people affected by racial bias in health-care algorithms by Heidi Ledford for Nature (Optional) Dissecting racial bias in an algorithm used to manage the health of populations by Obermeyer et al. in Science. |
Warmup Concept Mind Map |
|||
W May. 10 |
Final Project Presentations | ||||
We present our final projects in CSCI 0451! | |||||
Learning Objectives Project |
No matching items
Finals Period
During the reading and final exam period, you’ll meet with me 1-1 for about 15 minutes. The purpose of this meeting is to help us both reflect on your time in the course and agree on a final grade.
Due Dates
It’s best to submit all work that you wish to demonstrate your learning by the time of our final meeting. However, I will accept and assess work submitted by the last day of the final exam period.
© Phil Chodrow, 2023