Implementing the Perceptron Algorithm

2024-03-28

The expectations for all blog posts apply!

Introduction

In this blog post, you will complete an implementation of the perceptron algorithm and test it in several experiments.

Part A: Implement Perceptron

Getting Started

Please create a new blog post. In addition to the usual .ipynb file, please also create a script called perceptron.py in the same directory. This is the file in which you will implement the perceptron algorithm.
Then, in your .ipynb notebook file, place the following in a Python code block underneath your metadata block:

%load_ext autoreload
%autoreload 2
from perceptron import Perceptron, PerceptronOptimizer

The two commands beginning with % will cause your notebook to automatically re-read the contents of your perceptron.py file, so that changes you make in that file will be automatically and immediately reflected when you run cells in your blog notebook.

Implement the Perceptron Algorithm

Now it is time to implement the perceptron algorithm. I am going to guide you through doing this in a relatively specific way that will help us generalize when we work with more complex algorithms. Your implementation of the perceptron algorithm should be contained in the file perceptron.py. It is not part of your blog post, but you will link to it later.

`LinearModel` and `Perceptron.loss()`

If you haven’t already, implement the methods of the LinearModel class as described in this warmup. Please also implement Perceptron.loss().

`PerceptronOptimizer.step()`

Now implement the step() method of PerceptronOptimizer. In this method, you should assume that Perceptron.grad() correctly returns the “update” part of the perceptron update. You’ll implement Perceptron.grad() soon. In this function, do two things:

Call self.model.loss(). We’ll see why when we get to different classes of models.
Then, call self.model.grad() and add the result to self.model.w.

Recall that PerceptronOptimizer’s __init__ method saves a Perceptron object to the instance variable self.model.

`Perceptron.grad()`

Finally, it is time to implement Perceptron.grad(). This is where the math comes in – turning the math of the perceptron algorithm into code. In this part, you may assume that the argument X of Perceptron.grad() is a single row of the feature matrix.

Inside Perceptron.grad(), you should do two things:

Compute the score \(s_i = \langle \mathbf{w}, \mathbf{x}_i\rangle\).
Return the vector \(\mathbb{1}\left[s_i y_{i} < 0 \right] y_{i} \mathbf{x}_{i}\).

Check Your Implementation

Your code is probably working when you can run the “minimal training loop” code from this section of the notes and eventually achieve loss = 0 on linearly separable data. You should do this in your .ipynb file as part of your blog post. You can use the functions supplied in those notes to generate and visualize the data (when the data is 2d).

Code Quality

An excellent solution will be no more than 30 lines of code in total (across all three classes) and will contain no loops.

Part B: Experiments

Once you have completed your implementation and checked that it appears to be working on a simple example, please perform the following experiments and construct the requested visualizations.

Please perform experiments (with visualizations) that illustrate the following claims:

Using 2d data like the data in the example above, if the data is linearly separable then the perceptron algorithm converges to weight vector \(\vw\) describing a separating line (provided that the maximum number of iterations is large enough).
- Please show visualizations of the data, the separating line, and the evolution of the loss function during training.
- You are encouraged but not required to create a plot like this one from the lecture notes. Note that the code that generates this figure is supplied in a popout; you are free to use it with attribution.
For 2d data, when the data is not linearly separable, the perceptron algorithm will not settle on a final value of \(\vw\), but will instead run until the maximum number of iterations is reached, without achieving perfect accuracy.
- Please show visualizations of the data, the decision boundary in the final iteration, and the evolution of the loss over training.
- You will need to set a maximum number of iterations in your training loop in order to ensure that the algorithm terminates. 1000 iterations is plenty.
The perceptron algorithm is also able to work in more than 2 dimensions! Show an example of running your algorithm on data with at least 5 features. You don’t need to visualize the data or the separating line, but you should still show the evolution of the score over the training period. Include a comment on whether you believe that the data is linearly separable based on your observation of the score.

Part C (Optional): Minibatch Perceptron

This part is optional if you are satisfied with an M on this blog post. It is required for an E.

The mini-batch perceptron algorithm computes an update using \(k\) points at once, rather than a single point. In each step \(t\):

Pick \(k\) random indices \(i_1,\ldots,i_k\).
Perform the update

\[ \mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} + \frac{\alpha}{k}\sum_{\ell = 1}^k \mathbb{1}\left[\langle \mathbf{w}, \mathbf{x}_{i_k} \rangle y_{i_k} < 0 \right] y_{i_k} \mathbf{x}_{i_k} \tag{1}\]

Equation 1 corresponds to computing \(k\) perceptron increments simultaneously and averaging them before modifying \(\mathbf{w}\). The hyperparameter \(\alpha\) is a learning rate that determines how quickly the weight vector \(\mathbf{w}\) changes in each iteration. The learning rate can be tuned in order to achieve good results.

To implement mini-batch updating, modify your perceptron.grad() method so that it accepts a submatrix of the feature matrix \(\mathbf{X}\). You should assume that this submatrix has size \(k \times p\). It is possible to perform this implementation in two lines and with no for-loops if you are careful. You’ll then need to modify your main training loop so that a random submatrix of the feature matrix is passed to PerceptronOptimizer.step().

Hint: You can get a random submatrix of the feature matrix X and target vector y like this:

k = 5
ix = torch.randperm(X.size(0))[:k]
print(X[ix,:])
print(y[ix])

Minibatch Perceptron Experiments

Please perform experiments and create visualizations to demonstrate the following:

When k = 1, minibatch perceptron performs similarly to regular perceptron.
When k = 10, minibatch perceptron can still find a separating line in 2d.
When k = n (that is, the batch size is the size of the entire data set), minibatch perceptron can converge even when the data is not linearly separable, provided that the learning rate \(\alpha\) is small enough.

Part D: Writing

In crafting your blog post, please include the following components:

At the very top of your blog post, a link to your source code (perceptron.py) on your GitHub repo.
A brief walk-through of your implementation of perceptron.grad() and an explanation of how it correctly implements the math of the perceptron update. It is not necessary to walk the user through every single aspect of your solution class.
Full code and English descriptions for all the experiments you perform.
After your experiments, please address the following question:

What is the runtime complexity of a single iteration of the perceptron algorithm? Does the runtime complexity of a single iteration depend on the number of data points \(n\)? What about the number of features \(p\)? If you implemented minibatch perceptron, what is the runtime complexity of a single iteration of the minibatch perceptron algorithm?

Finally, please add an introductory “abstract” paragraph describing the high-level purpose of the blog post and a concluding summary paragraph in which you reflect on your findings.