Math Self-Assessment

2023-02-19

Instructions

This is an ungraded self-assessment. The purpose is for you to assess your comfort with some of the key mathematical ideas we will draw on in this course. Please do the following:

  1. Print out this webpage or save it as a PDF.
  2. Read each problem.
  3. On paper or a tablet, draw a little face beside each one to indicate your level of comfort:
    • 😊: you are confident and can do this problem without needing to look anything up.
    • 😑: you believe you could figure this problem out, but you would need to consult a text first.
    • 😓: you feel stuck and aren’t sure how to begin this problem.
  4. Recommended but optional: actually do the problems.
  5. Submit your version of the assignment with little faces on Canvas.

It is ok if some of this content is new to you. See how far you can get and make a note of what you may still need to learn or practice. We will not need most of these ideas until Week 5 or so, so you have time to brush up on any topics for which you feel uncomfortable.

  • This resource from Stanford’s CS246 contains most of the linear algebra that you’ll need for the course.

Part A: Single Variable Differentiation

The logistic sigmoid \(\sigma\) is the function \(\sigma: \mathbb{R} \rightarrow \mathbb{R}\) given by the formula \[ \begin{aligned} \sigma(w) = \frac{1}{1 + e^{-w}}\;. \end{aligned} \]

A.1

Make a sketch of this function.










A.2

Compute \(\frac{d\sigma(w_0)}{dw}\).

\(\frac{d\sigma(w_0)}{dw}\) is the first derivative of \(\sigma\) evaluated at the point \(w_0\).










A.3

Using your computation of \(\frac{d\sigma(w_0)}{dw}\) from the previous part, check that the following formula holds. You should do so by computing both sides of the equation and verifying that they are equal.

\[ \begin{aligned} \frac{d\sigma(w_0)}{dw} = \sigma(w_0)\left(1 - \sigma(w_0)\right)\;. \end{aligned} \]










A.4

Compute \(\frac{d^2\sigma(w_0)}{dw^2}\). It is recommended to use your results from the previous two parts.

\(\frac{d^2\sigma(w_0)}{dw^2}\) is the second derivative of \(\sigma\) evaluated at the point \(w_0\).










A.5

Provided that the function in question has sufficiently many derivatives, the Taylor expansion can be used to approximate that function near a point. For example, we might find it useful to estimate \(\sigma(w_0 + \delta)\), where \(\delta\) is a small number.

Compute an estimate of \(\sigma(w_0 + \delta)\) using the second-order Taylor expansion of \(\sigma\), assuming that \(\delta\) is sufficiently small.










A.6

The error in this approximation has scaling \(O(g(\delta))\) for some function \(g\). Please say as much as you are able about the function \(g\).










Part B: Multivariable Differentiation

Let \(f:\mathbb{R}^p \rightarrow \mathbb{R}\) be a function that accepts a vector \(\mathbf{w} \in \mathbb{R}^p\) as its argument and returns a scalar \(y \in \mathbb{R}\). Let’s write \(\mathbf{w} = (w_1,\ldots,w_p)^T\).

Partial Derivative

The partial derivative of \(f\) at point \(\mathbf{w}\) with respect to component \(w_i\) is

\[ \begin{aligned} \frac{\partial f}{\partial w_i}(\mathbf{w}) = \lim_{h \rightarrow 0} \frac{f(\mathbf{w} + h \mathbf{e}_i) - f(\mathbf{w})}{h}\;, \end{aligned} \]

provided that this limit exists. Here, \(\mathbf{e}_i\) is the \(i\)th standard basis vector.

There is an easy “recipe” for computing partial derivatives: treat \(f\) like a single-variable function in which every variable except \(w_i\) is held constant, then take the usual single-variable derivative with respect to \(w_i\).

Example

Let \(f:\mathbb{R}^3 \rightarrow \mathbb{R}\) be given by the formula

\[ \begin{aligned} f(w_1, w_2, w_3) = \sin(w_1 w_2) + w_1 + e^{2w_3}\;. \end{aligned} \]

The partial derivatives of \(f\) evaluated at a generic point \((w_1,w_2,w_3)^T\) can be obtained by holding all variables except one constant, and then applying standard rules from single-variable calculus:

\[ \begin{aligned} \frac{\partial f}{\partial w_1} &= w_2 \cos(w_1 w_2) + 1 \\ \frac{\partial f}{\partial w_2} &= w_1 \cos(w_1 w_2)\\ \frac{\partial f}{\partial w_3} &= 2e^{w_3}\;. \end{aligned} \]

Gradient

The vector of all partial derivatives of a function \(f:\mathbb{R}^p \rightarrow \mathbb{R}\) at a point \(\mathbf{w}\) is called the gradient \(\nabla f (\mathbf{w})\):

\[ \begin{aligned} \nabla f(\mathbf{w}) = \left(\frac{\partial f(\mathbf{w})}{\partial w_1}, \frac{\partial f(\mathbf{w})}{\partial w_2},\ldots,\frac{\partial f(\mathbf{w})}{\partial w_p} \right)^T\;. \end{aligned} \]

B.1

Let \(f:\mathbb{R}^3 \rightarrow \mathbb{R}\) be given by the formula

\[ \begin{aligned} f(w_1, w_2, w_3) = w_1^2 + e^{w_1w_2} + \cos(3w_3)\;. \end{aligned} \]

Compute \(\nabla f(w_1,w_2,w_3)\).










B.2

It might seem strange to use the variable \(\mathbf{w}\) as the variable of differentiation while treating \(\mathbf{x}\) as a constant. There will be a reason for it when we get to algorithms!

Let \(\mathbf{x}, \mathbf{w} \in \mathbb{R}^p\), and let \(\langle \mathbf{w}, \mathbf{x} \rangle = \sum_{i = 1}^p x_iw_i\) be the Euclidean inner product between \(\mathbf{w}\) and \(\mathbf{x}\). Let \(f_{\mathbf{x}}(\mathbf{w}) = \langle \mathbf{w}, \mathbf{x} \rangle\). Here, we are treating \(f_{\mathbf{x}}(\mathbf{w})\) as a function of \(\mathbf{w}\), holding \(\mathbf{x}\) constant.

The operation \(\langle \mathbf{w},\mathbf{x}\rangle\) is also called the dot product and is sometimes written \(\mathbf{w}\cdot \mathbf{x}\) or \(\mathbf{w}^T\mathbf{x}\).

Let \(\nabla_\mathbf{w}f_{\mathbf{x}}(\mathbf{w})\) be the gradient with respect to the variables in \(\mathbf{w}\), holding \(\mathbf{x}\) constant. Write a careful calculation showing that \(\nabla_\mathbf{w}f_{\mathbf{x}}(\mathbf{w}) = \mathbf{x}\).










B.3

Let \(f_\mathbf{x}(\mathbf{w}) = \sigma(\langle \mathbf{w}, \mathbf{x}\rangle)\). Compute \(\nabla_\mathbf{w}f_{\mathbf{x}}(\mathbf{w})\). You are likely to find the results of previous parts to be helpful.

Hint: chain rule.










B.4

Let \(\lVert \mathbf{w} \rVert_2 = \sqrt{\sum_{i = 1}^p w_i^2}\) be the Euclidean norm of \(\mathbf{w}\). Let \(f(\mathbf{w}) = \lVert \mathbf{w} \rVert_2\). Compute \(\nabla_\mathbf{w} f(\mathbf{w})\). Express your answer in terms of the vector \(\mathbf{x}\), but not any of the individual entries \(x_i\).










B.5

Mixed Partial Derivatives

Let \(f:\mathbb{R}^p \rightarrow \mathbb{R}\) be a function. The mixed partial derivative \(\frac{\partial^2 f(\mathbf{w})}{\partial w_i \partial w_j}\) is defined as

\[ \begin{aligned} \frac{\partial^2 f(\mathbf{w})}{\partial w_i \partial w_j} = \frac{\partial}{\partial w_j}\frac{\partial f(\mathbf{w})}{\partial w_i }\;. \end{aligned} \]

That is, to compute a mixed partial derivative, first compute \(\frac{\partial f(\mathbf{w})}{\partial w_i }\) and then differentiate that with respect to \(w_j\).

Compute \(\frac{\partial^2 f(\mathbf{w})}{\partial w_i \partial w_j}\) for the function \(f_\mathbf{x}(\mathbf{w}) = \sigma(\langle \mathbf{x}, \mathbf{w}\rangle)\), assuming that \(1 \leq i,j \leq p\). Please consider two cases: \(i \neq j\) and \(i = j\).










Part C: Computing Probabilities From Contingency Tables

Suppose that a randomly-selected population of 1000 people is tested for COVID-19 using a rapid test (which is occasionally wrong), as well as a very thorough lab test which we’ll imagine is always right. The results are summarized in the following table:

COVID+ COVID-
Rapid + 102 31
Rapid - 65 802

We are going to use these results to estimate the reliability of the rapid test.

C.1

The accuracy of the rapid test is the fraction of the time in which the rapid test “got the right answer.” What is the accuracy of the rapid test according to these results?










C.2

The true positive rate is the proportion of all COVID+ people who received a Rapid+ result. What is the true positive rate of the rapid test?










C.3

The false positive rate is the proportion of all COVID- people who received a Rapid+ result. What is the false positive rate of the rapid test?










C.4

The true negative rate is the proportion of all COVID- people who received a Rapid- result. The false negative rate is the proportion of all COVID+ people who received a Rapid- result. What are the true and false negative rates of the rapid test?










C.5

Suppose that you took the rapid test and received a positive result. What is your estimate of the probability that you are truly COVID+?










C.6

Suppose that you took this rapid test and received a negative result. What is your estimate of the probability that you are truly COVID+?












© Phil Chodrow, 2024