Implementing Logistic Regression
2024-04-04
\[ \newcommand{\R}{\mathbb{R}} \newcommand{\vx}{\mathbf{x}} \newcommand{\vy}{\mathbf{y}} \newcommand{\mX}{\mathbf{X}} \newcommand{\vw}{\mathbf{w}} \newcommand{\bracket}[1]{\langle #1 \rangle} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\paren}[1]{\left( #1 \right)} \]
The expectations for all blog posts apply!
Introduction
Recently, we introduced the gradient descent algorithm for solving the empirical risk minimization problem. We also calculated the gradient of the loss function for logistic regression.
In this blog post you will:
- Implement gradient descent for logistic regression in an object-oriented paradigm.
- Implement a key variant of gradient descent with momentum in order to achieve faster convergence.
- Perform experiments to test your implementations.
Part A: Implement Logistic Regression
Getting Started
- Please create a new blog post. In addition to the usual
.ipynb
file, please also create a script calledlogistic.py
in the same directory. This is the file in which you will implement logistic regression. - Then, in your
.ipynb
notebook file, place the following in a Python code block underneath your metadata block:
%load_ext autoreload
%autoreload 2
from logistic import LogisticRegression, GradientDescentOptimizer
If you complete the optional implementation of Newton’s Method, you’ll also need to import NewtonOptimizer
.
Implement LinearModel
and LogisticRegression()
If you haven’t already, implement the methods of the LinearModel
class as described in this warmup. Then, define a new class called LogisticRegression
which inherits from LinearModel
. This class should have two methods:
LogisticRegression.loss(X, y)
should compute the empirical risk \(L(\mathbf{w})\) using the logistic loss function \[ \begin{aligned} L(\mathbf{w}) = \frac{1}{n} \sum_{i = 1}^n \left[-y_i \log \sigma(s_i) - (1-y_i)\log (1-\sigma(s_i))\right] \end{aligned} \] The weight vector \(\mathbf{w}\) used for this calculation should be stored as an instance variable of the class.LogisticRegression.grad(X, y)
should compute the gradient of the empirical risk \(L(\mathbf{w})\). You can use the formula for the gradient supplied in the lecture notes on gradient descent.
For an M, you can implement LogisticRegression.grad
using a for
-loop. For an E, your solution should involve no explicit loops. While working on a solution that avoids loops, you might find it useful to at some point convert a tensor v
with shape (n,)
into a tensor v_
with shape (n,1)
. The code v_ = v[:, None]
will perform this conversion for you.
Implement GradientDescentOptimizer
Next, implement a GradientDescentOptimizer
class. For this project, we are going to implement gradient descent with momentum, also known as Spicy Gradient Descent. Let \(\mathbf{w}_k\) be the estimate of the weight vector at algorithmic step \(k\). Gradient descent with momentum performs the update \[
\begin{aligned}
\mathbf{w}_{k+1} \gets \mathbf{w}_k - \alpha \nabla L(\mathbf{w}_k) + \beta(\mathbf{w}_k - \mathbf{w}_{k-1})
\end{aligned}
\tag{1}\]
Here, \(\alpha\) and \(\beta\) are two learning rate parameters. When \(\beta = 0\) we have “regular” gradient descent. In practice, a choice of \(\beta \approx 0.9\) or so is common. Implementing the momentum term isn’t too complex – you’ll just need to create a new instance variable to hold the previous value of \(\mathbf{w}\) as well as the current one.
Part B: Experiments
Experimental Data
Here is some code to generate data for a classification problem. You can control the number of points by adjusting n_points
, the number of features by adjusting p_dims
, and the difficulty of the classification problem by adjusting the noise
(higher noise is a harder problem).
def classification_data(n_points = 300, noise = 0.2, p_dims = 2):
= torch.arange(n_points) >= int(n_points/2)
y = 1.0*y
y = y[:, None] + torch.normal(0.0, noise, size = (n_points,p_dims))
X = torch.cat((X, torch.ones((X.shape[0], 1))), 1)
X
return X, y
= classification_data(noise = 0.5) X, y
How to Train Your Model
Once you’ve correctly implemented logistic regression and gradient descent, you can do a gradient descent loop like this:
= LogisticRegression()
LR = GradientDescentOptimizer(LR)
opt
for _ in range(100):
# add other stuff to e.g. keep track of the loss over time.
= 0.1, beta = 0.9) opt.step(X, y, alpha
Experiments
Please perform experiments, with careful written explanations, that demonstrate the following statements:
- Vanilla gradient descent: When the number of features \(p_{\mathrm{dim}} = 2\), when \(\alpha\) is sufficiently small and \(\beta = 0\), gradient descent for logistic regression converges to a weight vector \(\mathbf{w}\) that looks visually correct (plot the decision boundary with the data). Furthermore, the loss decreases monotonically (plot the loss over iterations).
- This is a good experiment to use to assess whether your implementation in Part A has bugs.
- Benefits of momentum: On the same data, gradient descent with momentum (e.g. \(\beta = 0.9\)) can converge to the correct weight vector in fewer iterations than vanilla gradient descent (with \(\beta = 0\)). Plot the loss over iterations for each method. You may need to experiment with the data and choice of \(\alpha\) in order to observe speedups due to momentum.
- Overfitting: Generate some data where
p_dim > n_points
. For example,p_dim = 100
andn_points = 50
. Do this twice with the exact same parameters. Call the first datasetX_train, y_train
and the second datasetX_test, y_test
. Then, do an experiment in which you fit a logistic regression model to the dataX_train, y_train
and obtain 100% accuracy on this training data. What is the accuracy on the test data?
Part C: Writing
Please:
- Please include informative comments throughout your source code and a thorough docstring for each method in
LogisticRegression
andGradientDescentOptimizer
. - Please add careful expository writing throughout your blog post. You should describe each experiment and what it is intended to illustrate. You should also ensure that all your plots are legible and have appropriate axis labels and legends.
- At the beginning of your blog post, please place a link to your source code
logistic.py
on GitHub. After this link, please write an abstract paragraph describing the topic of your post and giving a brief overview of the experiments you performed. - At the conclusion of your blog post, please write a discussion paragraph reminding the reader of what you did and what you learned while doing it.
© Phil Chodrow, 2024