Schedule

Week 1

M
Feb. 13
We discuss how the course works and begin our discussion of classification and auditing.
Learning Objectives
Getting Oriented
Reading
Course syllabus
Collaboration
Why I Don't Grade by Jesse Stommel
Daumé 1.1-1.5
Notes
Welcome slides
Warmup
Set up your software.
Assignments
No really,
set up your software.
W
Feb. 15
Classification: The Perceptron
We study the perceptron algorithm, a historical method that serves as the foundation for many modern classifiers.
Learning Objectives
Theory
Implementation
Reading
Daumé 4.1-4.5, 4.7
Introduction to Numpy from The Python Data Science Handbook by Jake VanderPlas
Linear algebra with Numpy
Hardt and Recht, p. 33-41 (if you need to see a definition of a function gradient, see Daumé p. 93)
Notes
Lecture notes
Warmup
Perceptron
Assignments
Blog post: perceptron

Week 2

M
Feb. 20
Convex Linear Models and Logistic Regression
We discuss the modeling choices necessary to make the empirical risk minimization problem for linear classifiers tractable. In doing so we discuss convex functions and some of their properties that are relevant for optimization. Finally, we introduce logistic regression as an example of a convex linear classifier.
Learning Objectives
Theory
Implementation
Reading
Daumé 2.1-2.7
Daumé 7.1-7.3
Hardt and Recht, p. 70-77
Notes
Lecture notes
Warmup
Convexity
W
Feb. 22
Optimization via Gradient Descent
We discuss standard mathematical methods for empirical risk minimization, including gradient descent and stochastic gradient descent. We also recontextualize the perceptron algorithm as stochastic subgradient descent for a linear classifier with a specific loss function.
Learning Objectives
Theory
Implementation
Reading
Daumé 7.4-7.6
Diesenroth, Faisal, and Soon, p. 225-233
Notes
Lecture notes
Warmup
Gradient Descent
Assignments
Blog post: gradient descent

Week 3

M
Feb. 27
Features, Regularization, and Nonlinear Decision Boundaries
We learn how to use feature maps to help our convex linear classifiers learn nonlinear patterns. We also introduce the problem of overfitting and introduce feature selection and regularization as methods for addressing this problem.
Learning Objectives
Theory
Implementation
Navigation
Experimentation
Reading
Introducing Scikit-Learn
Hyperparameters and Model Validation
Feature Engineering
Notes
Lecture notes
Live version
Warmup
Gradient Descent Again
Assignments
ACTUAL REAL DUE DATE: Reflective Goal-Setting due 2/27
W
Mar. 01
Classification in Practice
We work through a complete modeling workflow for the Titanic survival data set. Along the way, we work with data frames and discuss cross-validation.
Learning Objectives
Navigation
Experimentation
Reading
Daumé Chapter 2 You may find it useful to review Chapter 1 as well.
Data Manipulation with Pandas (Focus on the sections up to and including "Aggregation and Grouping")
Notes
Lecture notes
Live version
Warmup
Overfitting and the Scientific Method
Assignments
Blog post: kernel logistic regression
OR
Blog post: penguins

Week 4

M
Mar. 06
Beyond Convex Linear Classifiers
We discuss several examples of other classifiers at a high level, including some that are nonlinear or nonconvex.
Learning Objectives
Navigation
Reading
NA
Notes
Lecture notes
Live version
W
Mar. 08
Linear Regression
We introduce linear regression, another convex linear model suitable for predicting real numbers instead of class labels.
Learning Objectives
Theory
Implementation
Reading
NA
Notes
Lecture notes
Live version
Assignments
Blog post: Linear regression

Week 5

M
Mar. 13
Introduction to Bias and Fairness
TBD
Learning Objectives
Social Responsibility
Experimentation
Reading
Machine Bias by Julia Angwin et al. for ProPublica.
Fair prediction with disparate impact by Alexandra Chouldechova, Sections 1 and 2.
Inherent trade-offs in the fair determination of risk scores by Jon Kleinberg et al, pages 1-5.
Notes
Lecture notes
Live version
Warmup
Balancing Classification Rates
W
Mar. 15
Critical Perspectives
We discuss limitations of the quantitative approach to studying discrimination, as well as critical perspectives on the role that automated decision systems play in surveilling and controlling marginalized individuals.
Learning Objectives
Social Responsibility
Experimentation
Reading
The Limits of the Quantitative Approach to Discrimination, speech by Arvind Narayanan
"The Digital Poorhouse" by Virginia Eubanks for Harper's Magazine
Notes
TBD
Warmup
Limits of the Quantitative Approach
Assignments
Blog post: Limits of quantitative methods
OR
Blog post: Auditing allocative bias

Week 6

M
Mar. 27
Vectorization
We discuss some ways by which complex objects like images and especially text can be represented as numerical vectors for machine learning algorithms.
Learning Objectives
Navigation
Experimentation
Reading
Murphy, Chapter 1. This is not related to vectorization; it's for you to get oriented on some possible project ideas. Don't worry about any math you don't understand.
Course project description
Notes
Lecture notes
Live version
Warmup
Pitch a Project Idea
Assignments
ACTUAL REAL DUE DATE: Mid-semester reflection due 4/05
W
Mar. 29
Introducing Unsupervised Learning: Topic Modeling
We begin to discuss unsupervised learning, with topic modeling as our initial example.
Learning Objectives
Theory
Navigation
Experimentation
Reading
Principal Component Analysis from the Python Data Science Handbook
Notes
Lecture notes
Live version
Warmup
Vectorization Brainstorm
Assignments
ACTUAL REAL DUE DATE: Project Proposal due 4/07

Week 7

M
Apr. 03
Clustering Data
We continue our discussion of unsupervised learning with two methods for clustering sets of data.
Learning Objectives
Theory
Navigation
Experimentation
Reading
K-Means Clustering from the Python Data Science Handbook
Notes
Lecture notes
Live version
Warmup
K-Means Compression
Assignments
Blog post: Unsupervised learning with linear algebra (however, using this time to complete a previous blog post is also highly recommended)
W
Apr. 05
Introducing Deep Learning
We begin our discussion of deep learning with a quick theoretical motivation and a first glance at the PyTorch package.
Learning Objectives
Theory
Navigation
Reading
Lecture 1, Introduction from Chinmay Hegde's course on deep learning at NYU
Notes
Lecture notes
Live version
Warmup
Introducing Tensors

Week 8

M
Apr. 10
Optimization For Deep Learning
We begin a discussion of the training process for neural networks, which requires efficient computation of gradients via backpropagation and efficient variations of gradient descent.
Learning Objectives
Theory
Implementation
Reading
Lecture 2, Neural Nets from Chinmay Hegde's course on deep learning at NYU
Notes
Lecture notes
Live version
Warmup
Efficient Differentiation
Assignments
Blog post: Optimization with Adam (however, using this time to complete a previous blog post is also highly recommended)
W
Apr. 12
Convolutional Neural Networks
We discuss methods for image classification using neural networks and introduce convolutional layers.
Learning Objectives
Theory
Experimentation
Reading
Convolutional Neural Networks from MIT's course 6.036.
A Comprehensive Guide to Convolutional Neural Networks by Sumit Saha on Towards Data Science has some good visuals.
Notes
Lecture notes
Live version
Warmup
Convolutional Kernels
Assignments
ACTUAL REAL DUE DATE: Engaging with Timnit Gebru
Part 1 due 4/19

Week 9

M
Apr. 17
More on Image Classification
We continue our discussion of image classification with convolutional neural networks.
Learning Objectives
Experimentation
Navigation
Reading
Convolutional Neural Networks from MIT's course 6.036.
A Comprehensive Guide to Convolutional Neural Networks by Sumit Saha on Towards Data Science has some good visuals.
Notes
Lecture notes
Live version
Warmup
Project Check-In
W
Apr. 19
Some Practical Techniques in Image Classification
We discuss data augmentation and transfer learning, two helpful techniques in image classification. We also highlight some of messy challenges involving management of complex data for classification tasks with PyTorch.
Learning Objectives
Theory
Experimentation
Notes
Lecture notes
Live version
Warmup
How Much Needs To Be Learned?

Week 10

M
Apr. 24
Dr. Timnit Gebru on Computer Vision and "Artificial General Intelligence"
We speak with Dr. Timnit Gebru about her recent work on computer vision and ideology in artificial general intelligence.
Learning Objectives
Social Responsibility
Warmup
Project Check-In
W
Apr. 26
Text Classification and Word Embeddings
We begin our study of text classification and the use of word embeddings for efficient text vectorization.
Learning Objectives
Theory
Experimentation
Reading
Efficient Estimation of Word Representations in Vector Space by Mikolov et al. (sections 1, 4, 5)
Notes
Lecture notes
Live version
Warmup
Word embedding
Assignments
Blog post: deep music classification

Week 11

M
May. 01
Word Embeddings
We continue our study of text classification by training a classifier and examining word embeddings.
Learning Objectives
Theory
Experimentation
Social Responsibility
Reading
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Can you explain how orthogonal projections can help reduce bias in word embeddings?
Fair Is Better than Sensational: Man Is to Doctor as Woman Is to Doctor
Notes
Lecture notes
Live version
Warmup
Project Check-In
W
May. 03
Text Generation and Recurrent Neural Networks
We use recurrent neural networks to generate synthetic text with several realistic attributes.
Learning Objectives
Theory
Implementation
Navigation
Reading
The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy
Notes
Lecture notes
Live version
Warmup
"Realistic" text

Week 12

M
May. 08
Reflection and Feedback
We look back on our time in the course, reflect on the responsibilities of data scientists in society, and give feedback on the course.
Learning Objectives
Theory
Social Responsibility
Reading
Millions of black people affected by racial bias in health-care algorithms by Heidi Ledford for Nature
(Optional) Dissecting racial bias in an algorithm used to manage the health of populations by Obermeyer et al. in Science.
Warmup
Concept Mind Map
W
May. 10
Final Project Presentations
We present our final projects in CSCI 0451!
Learning Objectives
Project
No matching items

Finals Period

During the reading and final exam period, you’ll meet with me 1-1 for about 15 minutes. The purpose of this meeting is to help us both reflect on your time in the course and agree on a final grade.

Due Dates

It’s best to submit all work that you wish to demonstrate your learning by the time of our final meeting. However, I will accept and assess work submitted by the last day of the final exam period.



© Phil Chodrow, 2023