Schedule

Readings in normal font should be completed and annotated ahead of lecture.
Readings in italic provide optional additional depth on the material.
Assignments are listed on the day when I suggest you begin working on them.

Week 1

M Feb. 13
M Feb. 13	We discuss how the course works and begin our discussion of classification and auditing.
	Learning Objectives Getting Oriented	Reading Course syllabus Collaboration Why I Don't Grade by Jesse Stommel Daumé 1.1-1.5	Notes Welcome slides	Warmup Set up your software.	Assignments No really, set up your software.
W Feb. 15	Classification: The Perceptron
W Feb. 15	We study the perceptron algorithm, a historical method that serves as the foundation for many modern classifiers.
	Learning Objectives Theory Implementation	Reading Daumé 4.1-4.5, 4.7 Introduction to Numpy from The Python Data Science Handbook by Jake VanderPlas Linear algebra with Numpy Hardt and Recht, p. 33-41 (if you need to see a definition of a function gradient, see Daumé p. 93)	Notes Lecture notes	Warmup Perceptron	Assignments Blog post: perceptron

Week 2

M Feb. 20	Convex Linear Models and Logistic Regression
M Feb. 20	We discuss the modeling choices necessary to make the empirical risk minimization problem for linear classifiers tractable. In doing so we discuss convex functions and some of their properties that are relevant for optimization. Finally, we introduce logistic regression as an example of a convex linear classifier.
	Learning Objectives Theory Implementation	Reading Daumé 2.1-2.7 Daumé 7.1-7.3 Hardt and Recht, p. 70-77	Notes Lecture notes	Warmup Convexity
W Feb. 22	Optimization via Gradient Descent
W Feb. 22	We discuss standard mathematical methods for empirical risk minimization, including gradient descent and stochastic gradient descent. We also recontextualize the perceptron algorithm as stochastic subgradient descent for a linear classifier with a specific loss function.
	Learning Objectives Theory Implementation	Reading Daumé 7.4-7.6 Diesenroth, Faisal, and Soon, p. 225-233	Notes Lecture notes	Warmup Gradient Descent	Assignments Blog post: gradient descent

Week 3

M Feb. 27	Features, Regularization, and Nonlinear Decision Boundaries
M Feb. 27	We learn how to use feature maps to help our convex linear classifiers learn nonlinear patterns. We also introduce the problem of overfitting and introduce feature selection and regularization as methods for addressing this problem.
	Learning Objectives Theory Implementation Navigation Experimentation	Reading Introducing Scikit-Learn Hyperparameters and Model Validation Feature Engineering	Notes Lecture notes Live version	Warmup Gradient Descent Again	Assignments ACTUAL REAL DUE DATE: Reflective Goal-Setting due 2/27
W Mar. 01	Classification in Practice
W Mar. 01	We work through a complete modeling workflow for the Titanic survival data set. Along the way, we work with data frames and discuss cross-validation.
	Learning Objectives Navigation Experimentation	Reading Daumé Chapter 2 You may find it useful to review Chapter 1 as well. Data Manipulation with Pandas (Focus on the sections up to and including "Aggregation and Grouping")	Notes Lecture notes Live version	Warmup Overfitting and the Scientific Method	Assignments Blog post: kernel logistic regression OR Blog post: penguins

Week 4

M Mar. 06	Beyond Convex Linear Classifiers
M Mar. 06	We discuss several examples of other classifiers at a high level, including some that are nonlinear or nonconvex.
	Learning Objectives Navigation	Reading NA	Notes Lecture notes Live version
W Mar. 08	Linear Regression
W Mar. 08	We introduce linear regression, another convex linear model suitable for predicting real numbers instead of class labels.
	Learning Objectives Theory Implementation	Reading NA	Notes Lecture notes Live version	Assignments Blog post: Linear regression

Week 5

M Mar. 13	Introduction to Bias and Fairness
M Mar. 13	TBD
	Learning Objectives Social Responsibility Experimentation	Reading Machine Bias by Julia Angwin et al. for ProPublica. Fair prediction with disparate impact by Alexandra Chouldechova, Sections 1 and 2. Inherent trade-offs in the fair determination of risk scores by Jon Kleinberg et al, pages 1-5.	Notes Lecture notes Live version	Warmup Balancing Classification Rates
W Mar. 15	Critical Perspectives
W Mar. 15	We discuss limitations of the quantitative approach to studying discrimination, as well as critical perspectives on the role that automated decision systems play in surveilling and controlling marginalized individuals.
	Learning Objectives Social Responsibility Experimentation	Reading The Limits of the Quantitative Approach to Discrimination, speech by Arvind Narayanan "The Digital Poorhouse" by Virginia Eubanks for Harper's Magazine	Notes TBD	Warmup Limits of the Quantitative Approach	Assignments Blog post: Limits of quantitative methods OR Blog post: Auditing allocative bias

Week 6

M Mar. 27	Vectorization
M Mar. 27	We discuss some ways by which complex objects like images and especially text can be represented as numerical vectors for machine learning algorithms.
	Learning Objectives Navigation Experimentation	Reading Murphy, Chapter 1. This is not related to vectorization; it's for you to get oriented on some possible project ideas. Don't worry about any math you don't understand. Course project description	Notes Lecture notes Live version	Warmup Pitch a Project Idea	Assignments ACTUAL REAL DUE DATE: Mid-semester reflection due 4/05
W Mar. 29	Introducing Unsupervised Learning: Topic Modeling
W Mar. 29	We begin to discuss unsupervised learning, with topic modeling as our initial example.
	Learning Objectives Theory Navigation Experimentation	Reading Principal Component Analysis from the Python Data Science Handbook	Notes Lecture notes Live version	Warmup Vectorization Brainstorm	Assignments ACTUAL REAL DUE DATE: Project Proposal due 4/07

Week 7

M Apr. 03	Clustering Data
M Apr. 03	We continue our discussion of unsupervised learning with two methods for clustering sets of data.
	Learning Objectives Theory Navigation Experimentation	Reading K-Means Clustering from the Python Data Science Handbook	Notes Lecture notes Live version	Warmup K-Means Compression	Assignments Blog post: Unsupervised learning with linear algebra (however, using this time to complete a previous blog post is also highly recommended)
W Apr. 05	Introducing Deep Learning
W Apr. 05	We begin our discussion of deep learning with a quick theoretical motivation and a first glance at the PyTorch package.
	Learning Objectives Theory Navigation	Reading Lecture 1, Introduction from Chinmay Hegde's course on deep learning at NYU	Notes Lecture notes Live version	Warmup Introducing Tensors

Week 8

M Apr. 10	Optimization For Deep Learning
M Apr. 10	We begin a discussion of the training process for neural networks, which requires efficient computation of gradients via backpropagation and efficient variations of gradient descent.
	Learning Objectives Theory Implementation	Reading Lecture 2, Neural Nets from Chinmay Hegde's course on deep learning at NYU	Notes Lecture notes Live version	Warmup Efficient Differentiation	Assignments Blog post: Optimization with Adam (however, using this time to complete a previous blog post is also highly recommended)
W Apr. 12	Convolutional Neural Networks
W Apr. 12	We discuss methods for image classification using neural networks and introduce convolutional layers.
	Learning Objectives Theory Experimentation	Reading Convolutional Neural Networks from MIT's course 6.036. A Comprehensive Guide to Convolutional Neural Networks by Sumit Saha on Towards Data Science has some good visuals.	Notes Lecture notes Live version	Warmup Convolutional Kernels	Assignments ACTUAL REAL DUE DATE: Engaging with Timnit Gebru Part 1 due 4/19

Week 9

M Apr. 17	More on Image Classification
M Apr. 17	We continue our discussion of image classification with convolutional neural networks.
	Learning Objectives Experimentation Navigation	Reading Convolutional Neural Networks from MIT's course 6.036. A Comprehensive Guide to Convolutional Neural Networks by Sumit Saha on Towards Data Science has some good visuals.	Notes Lecture notes Live version	Warmup Project Check-In
W Apr. 19	Some Practical Techniques in Image Classification
W Apr. 19	We discuss data augmentation and transfer learning, two helpful techniques in image classification. We also highlight some of messy challenges involving management of complex data for classification tasks with PyTorch.
	Learning Objectives Theory Experimentation		Notes Lecture notes Live version	Warmup How Much Needs To Be Learned?

Week 10

M Apr. 24	Dr. Timnit Gebru on Computer Vision and "Artificial General Intelligence"
M Apr. 24	We speak with Dr. Timnit Gebru about her recent work on computer vision and ideology in artificial general intelligence.
	Learning Objectives Social Responsibility			Warmup Project Check-In
W Apr. 26	Text Classification and Word Embeddings
W Apr. 26	We begin our study of text classification and the use of word embeddings for efficient text vectorization.
	Learning Objectives Theory Experimentation	Reading Efficient Estimation of Word Representations in Vector Space by Mikolov et al. (sections 1, 4, 5)	Notes Lecture notes Live version	Warmup Word embedding	Assignments Blog post: deep music classification

Week 11

M May. 01	Word Embeddings
M May. 01	We continue our study of text classification by training a classifier and examining word embeddings.
	Learning Objectives Theory Experimentation Social Responsibility	Reading Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. Can you explain how orthogonal projections can help reduce bias in word embeddings? Fair Is Better than Sensational: Man Is to Doctor as Woman Is to Doctor	Notes Lecture notes Live version	Warmup Project Check-In
W May. 03	Text Generation and Recurrent Neural Networks
W May. 03	We use recurrent neural networks to generate synthetic text with several realistic attributes.
	Learning Objectives Theory Implementation Navigation	Reading The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy	Notes Lecture notes Live version	Warmup "Realistic" text

Week 12

M May. 08	Reflection and Feedback
M May. 08	We look back on our time in the course, reflect on the responsibilities of data scientists in society, and give feedback on the course.
	Learning Objectives Theory Social Responsibility	Reading Millions of black people affected by racial bias in health-care algorithms by Heidi Ledford for Nature (Optional) Dissecting racial bias in an algorithm used to manage the health of populations by Obermeyer et al. in Science.	Warmup Concept Mind Map
W May. 10	Final Project Presentations
W May. 10	We present our final projects in CSCI 0451!
	Learning Objectives Project

Finals Period

During the reading and final exam period, you’ll meet with me 1-1 for about 15 minutes. The purpose of this meeting is to help us both reflect on your time in the course and agree on a final grade.

Due Dates

It’s best to submit all work that you wish to demonstrate your learning by the time of our final meeting. However, I will accept and assess work submitted by the last day of the final exam period.