Newton's Method

Newton’s Method for Optimization

In this section we will understand Newton’s Method for solving optimization problems in Machine Learning.


📘

Newton’s Method:
It is a second-order iterative gradient based optimization technique known for its extremely fast convergence.
When close to optimum, it achieves quadratic convergence, better than gradient descent’s linear convergence.

Algorithm:

  1. Start at a random point \(x_k\).
  2. Compute the slope at \(x_k, ~i.e,~ f'(x_k)\).
  3. Compute the curvature at \(x_k, ~i.e,~ f''(x_k)\).
  4. Draw a parabola at \(x_k\) that locally approximates the function.
  5. Jump directly to the minimum of that parabola; that’s the next step. Note: So, instead of walking down the slope step by step (gradient descent), we are jumping straight to the point where the curve bends downwards towards the bottom.
\[ x_{k+1} = x_k - \frac{f\prime(x_k)}{f\prime\prime(x_k)} \\[10pt] \text{ step size = } \frac{1}{f\prime\prime(x_k)} \\[10pt] f\prime\prime(x_k) : \text{ tells curvature of the function at } x_k \\[10pt] x_{new} = x_{old} - (\nabla^2 f(x_{old})^{-1} \nabla f(x_{old}) \\[10pt] \nabla^2 f(x_{old}): Hessian \]




For example:

  1. Find the minima of \(f(x) = x^2 - 4x + 5\) To find the minima, lets calulate the first derivative and equate to zero.
    \(f'(x) = 2x - 4 = 0 \)
    \( => x^* = 2 \)

    \(f''(x) = 2 >0 \) => minima is at \(x^* = 2\)

Now, we will solve this using Newton’s Method.
Let’s start at x = 0.

\[ x_{new} = x_{old} - \frac{f\prime(x_{old})}{f\prime\prime(x_{old}} \\[10pt] => x_{new} = 0 - \frac{2*0 -4}{2} = 0-(-2) \\[10pt] => x_{new} = 2 \]

Hence, we can see that using Newton’s Method we can get to the minima \(x^* = 2\) in just 1 step.



End of Section