C<sup>4</sup>E: Computation For Economists

Climb Every Mountain:^♭Sound of Music 1965 Gradient and Hessian Based Optimization

Steepest Ascent

NR 10.6

Algorithm 15. Steepest Ascent

Given

$f(x)$ and

$x^0\in \Re^n$ and tolerance

$\epsilon>0$ .

Initialize. Set $t=0$ .
→Compute Gradient. $\nabla\,f_t = \nabla\,f(x^t)$ . Strong convergence: If $\|\nabla\,f_t\| \mathop{=_{\epsilon_1}} 0$ , then .
Line Optimization.

$\lambda^\star = {\arg\max}_\lambda f\left(x^t + \lambda \nabla\,f^{\,\prime}_t \right).\nonumber$

Set

$x^{t+1} = x^t + \lambda^\star\nabla\,f^{\,\prime}_t$ .

Increment $t$ and repeat previous two steps (go back to →).

Two Examples

$f(x_0,x_1) = -x_0^2 + x_0x_1 - x_1^2.\nonumber$

$\nabla\,f = (\,-2x_0+x_1\,,\,x_0 - 2x_1\,).\nonumber$

$x^\star = (0,0)$

$x^0 = (2,1)$

$\nabla\,f^0 = (-3,-1)$

$s^0=.75$

$x^1 = x^0-(1.5)\nabla\,f^0 = (-0.25,.25).\nonumber$

$x^\star$

$f(x_0,x_1)=-\left(\,1-x_0\,\right)^{2}-100\left(\,x_1-x_0^2\,\right)^{2}.\tag{Rosenbrock}$

Rosenbrock function

28-rosenbrock.ox

$x^\star =(0,0)$

$x^\star$

Newton's Method

Newton's method

curled

$f(x)$

$x_0$

$\nabla\,f_0$

$Hf_0$

$f(x)$

$x_0$

${f(x) \approx f(\,x_0\,) + (\,x-x_0\,)\nabla\,f_0^T + {1\over 2}\left(\,x-x_0\,\right)^T \left[ Hf_0\right] \left(\,x-x_0\,\right).}\tag{Taylor}\label{Taylor}$

$^T$

$x_0$

$x^\star = x_0 - H^{-1}f_0 \nabla\,f_0^T.\tag{QuadMax}\label{QuadMax}$

$x^\star$

$f(x)$

Algorithm 16. Newton Iteration

Given

$f(x)$ , and

$x_0\in \Re^n$ and tolerance

$\epsilon_1$ .

Set $t=0$ .
→Compute $\nabla\,f_t$ Strong convergence: If $\|\,\nabla\,f_t\,\|\ \mathop{=_{\epsilon_1}}\ 0$ , then .
Compute $H_t = Hf(\,x_t\,)$ .
Solve $H_t\,s_t = -\nabla\,f_t^T$ for the direction $s_t$ .
Perform line optimization on $\lambda$ :

$\lambda^\star_t = {\arg\max}_\lambda f\left(x_t + \lambda s_t\right)\nonumber$

Set

$x_{t+1} = x_t + \lambda^\star s_t$

Increment $t$ and repeat previous two steps (go back to →).

Hessian at each iteration `curls' the direction away from steepest ascent.

$H$

costly

misleading

Quasi-Newton Methods

NR 10.7

quasi-Newton

approximation

BFGS

Algorithm 17. BFGS Updating

Given

$f(x)$ , and

$x_0\in \Re^n$ .

Initialize.
Compute $\nabla\, f_0$ . Set $t=0$ and $H_0=I$ (or some other symmetric invertible matrix).
→ Strong convergence: If $\|\,\nabla\,f_t\,\|\ \mathop{=_{\epsilon_1}}\ 0$ , then .
Solve $H_t s_t = -\nabla\,f_t^T$ .
Perform line optimization on $\lambda$ :

$\lambda^\star_t = {\arg\max}_\lambda f\left(x_t + \lambda s_t\right)\nonumber$

Set

$x_{t+1} = x_t - \lambda^\star_t$

Update $H$ (according to BFGS).

Compute

$\nabla\,f(x_{t+1})$

$z=x_{t+1}-x_t$

$y= \left(\nabla\, f_{t+1}-\nabla\, f_t\right)^T$

${H_{t+1} = H_t - { (H_t z)(H_t z)^T\over z^TH_t z} + {yy^T \over y^T z}.}\tag{BFGS}\label{BFGS}$

Increment $t$ and repeat the previous 4 steps (go to →).

$H_t$

$Hf(x_t)$

$n$

$H_T$

not

$Hf(x^\star)$

never

$H$

$Hf(x^\star)$

Gradient-Based Methods in Ox

#import "maximize"

    MaxNewton(func, ax, af, aH, UseNH);

        Argument        Explanation
        ------------------------------------------------------------------------------------
        func            function that computes the objective (see format below)
        ax              address of starting vector, where final values will be placed
        aH              address to place final Hessian (can be 0 if not needed)
        UseNH           0=func computes Hessian
                        1=Use numerical Hessians computed in MaxNewton
        ------------------------------------------------------------------------------------

    Returns an integer Convergence Code:

          Value    Label            Explanation
        ------------------------------------------------------------------------------------------------
            0      MAX_CONV         Strong convergence Both convergence tests were passed
            1      MAX_WEAK_CONV    Weak convergence (no improvement in line search); step length
                                        too small but one test passed
            2      MAX_MAXIT        No convergence (maximum no of iterations reached)
            3      MAX_LINE_FAIL    No convergence (no improvement in line search); step length
                                        too small and convergence test not passed.
            4      MAX_FUNC_FAIL    No convergence (initial function evaluation failed)
            5      MAX_NOCONV       No convergence Probably not yet attempted to maximize.
        ------------------------------------------------------------------------------------------------

MaxNewton()

MaxNewton

    myobjective(x, af, agrad, aHess);

        Argument        Explanation
        ------------------------------------------------------------------------------------
        x               parameter vector
        af              address of where to put f(x)
        agrad           address to place gradient
                        0 = gradient not needed
        aHess           address to place Hessian
                        0 = Hessian not needed
        ------------------------------------------------------------------------------------

    Returns
    1: function computed successful at x
    0: function evaluation failed at x

Num1Derivative()

Num2Derivative()

maximize

…

Notice that MaxNewton gives you the option of using numerical Hessians but your function must supply the gradient when asked for. That is, if the third argument agrad is not an int it is an address and it means that the algorithm wants the gradient computed as well as the function. But you can use Num1Derivative() rather than code the analytic derivatives. So if you want to send it to MaxNewton and don't want to code analytic derivatives add a line to your function that looks like this:

    if (!isint(agrad)) Num1Derivative(myobjective,x,agrad);

Recursion

This is an example of a recursive function call because myobjective() will be called within another call to myobjective(). Machinery in the background of the language keeps this from going terribly wrong. In particular, each call to a function goes on something called the "runtime stack" which keeps straight where the program is and which functions are still executing.

$-x_0^2+x_0x_1-x_1^2$

27-maxnewton.ox
 1:    #include "oxstd.h"
 2:    #import "maximize"
 3:    
 4:    const decl beta = 0.9;
 5:    f(x,af,ag,aH) {
 6:    	af[0] = -x[0]^2 + 0.5*x[0]*x[1] - x[1]^2 + beta*log(1+x[0]+x[1]);
 7:    	if (!isint(ag)) Num1Derivative(f,x,ag);
 8:    	return 1;
 9:    	}
10:    
11:    main() {
12:    	decl xp = <2;1>, fv;
13:    	MaxControl(-1,1);
14:    	println("Conv. Code=",MaxNewton(f,&xp,&fv,0,0));
15:    	}

Starting values
parameters
       2.0000       1.0000
gradients
      -3.0000      0.00000
Initial function =                  -3

Position after 1 Newton iterations
Status: Strong convergence
parameters
  2.9771e-005  1.9326e-005
gradients
 -4.0216e-005 -8.8816e-006
function value = -7.53864179238e-010
Conv. Code=0

Exercises

Apply both steepest ascent and Newton's method to maximize the (a) Rosenbrock function and (b) a simple quadratic function in two variables. Compare the rates of convergence.
Consider the function $g(x_0,x_1) = -x_0^2 + \alpha x_0 x_1 - x_1^2.\nonumber$ We already have seen how this function behaves when the parameter $\alpha$ is set to 1. Derive the 2×2 Hessian and show that the function is strictly concave as long as $\alpha<2$ , but for $\alpha \ge 2$ the function now longer has a maximum.
Save 27-maxnewton.ox to a new file. Set beta=0 and modify it so that a new parameter makes the function the same as $g()$ in the previous question (so at the start alpha=0.5). Now set $\alpha = 1.999$ and then $\alpha = 2.0001$ . Run the program to see how this affects the performance of Newton's method.
Modify the code below to start BFGS and Newton at the same point to maximize it. Compare the number of function evaluations required to find the maximum.

28-rosenbrock.ox
 1:    #include "oxstd.h"
 2:    #import "maximize"
 3:    
 4:    decl fevcnt;
 5:    
 6:    rb(x,aF,aS,aH) {
 7:    	aF[0] = -(1-x[0])^2 - 100*(x[1]-x[0]^2)^2;
 8:    	if (!isint(aS)) Num1Derivative(rb,x,aS);
 9:    	if (!isint(aH)) aH[0] = -unit(2);
10:    	++fevcnt;
11:    	return 1;
12:    	}
13:    
14:    main() {
15:    	decl z = <1.5;-1.5>,frb,ccode;
16:    	rb(z,&frb,0,0);
17:    	MaxControl(-1,1);
18:    	fevcnt = 0;
19:    //	ccode=MaxNewton(rb,&z,&frb,0,1);  //MaxBFGS Newton
20:    	ccode=MaxBFGS(rb,&z,&frb,0,1);  //MaxBFGS Newton
21:    	println("x*=",z,"f(x*)=",frb,"/n convergence = ",ccode
22:    		," #of eval: ",fevcnt);
23:    	}
24:

Climb Every Mountain:^♭Sound of Music 1965 Gradient and Hessian Based Optimization

Algorithm 15. Steepest Ascent

Exhibit 40. Steepest Descent in Two Dimensions

Algorithm 16. Newton Iteration

Exhibit 41. Newton versus Steepest Descent

Exhibit 42. Newton and Bad Starting Values

Algorithm 17. BFGS Updating

Exercises

Climb Every Mountain:♭Sound of Music 1965 Gradient and Hessian Based Optimization

Algorithm 15. Steepest Ascent

Exhibit 40. Steepest Descent in Two Dimensions

Algorithm 16. Newton Iteration

Exhibit 41. Newton versus Steepest Descent

Exhibit 42. Newton and Bad Starting Values

Algorithm 17. BFGS Updating

Exercises

Climb Every Mountain:^♭Sound of Music 1965 Gradient and Hessian Based Optimization