1. Newton-Raphson and Broyden
    NR 9.4, 9.6; Judd II.5.5
  1. To Solve Non-Linear: Solve Linear Over and Over
  2. Given $g : \Re^n \to \Re^n$, a square non-linear system of equations. The methods we cover assume that the second derivatives of the system are continuous. We wish to find a $x^\star\in \Re^n$ that satisfies $$g\left(x^\star\right)= \overrightarrow{0}.\tag{NLS1}\label{NLS1}$$ A vector that satisfies \eqref{NLS1} is called a root of the system. Since the system $g$ will be computed numerically and the value of the inputs $x$ are stored as a vector of floating point reals, is very likely that we cannot satisfy the equation (*) exactly. So, we will be satisfied if we are "close enough."

    Newton-Raphson is a centuries-old algorithm to find the root of a non-linear system which uses the Jacobian of the system. It "pretends" that the system is linear to from one guess of the root to another. A linear system has a constant Jacobian, so each time the algorithm iterates it recomputes the Jacobian and solves another linear system. To understand Newton-Raphson, start with the situation when the system really is linear. That means we can write $g(x)$ as: $$g(x) = Ax+b,\nonumber$$ where $A$ is a square matrix and $b$ is a $n\times 1$ vector of constants. Of course, if a linear system has a solution it is closed-form: $x^\star = -A^{-1}b$. That is, if we can compute $A^{-1}$, then we can find the $x^\star$ that is the unique root of the system.

    Given $A^{-1}$ is available, an iterative or sequential solution is not needed to solve a linear system, but we can think of it like an iteration solution which might help to see how non-linear methods build on linear ones. That is, suppose we started at a guess $x_0$. The value of the system at that point is $g(x_0) = Ax_0 + b$. If we wanted to get to $x^\star$ from there we could say: $$x^\star = x_1 \equiv x_0 - A^{-1}(b+Ax_0) = x_0 - A^{-1}g(x_0).\nonumber$$ In other words, we find the root by subtracting a vector from the initial guess $x_0$. We take a step from $x_0$ to get a new value $x_1$. The size of the step is such that we wipe out the difference $g(x_0)$. Reminder: It is typically inefficient to actually compute $A^{-1}$. Instead, solve linear system $$A s = -g\nonumber$$ to determine a step: $x_1 = x_0 + s.$

    Exhibit 38. Linear System as an Iteration

    When $g$ is non-linear, we use the iterative formula treating $g$ as (locally) linear. The (local) analogy to $A$ is then $Jg(x)$, the Jacobian of the system evaluated at $x$.

    Algorithm 12. Newton-Raphson Iteration

      Given $g(x)$, and $x_0\in \Re^n$. Attempt to find a solution to
      $$g(x^\star) = \overrightarrow{0}.\nonumber$$
    1. Set $t=0$. Compute $g_t = g(x_t)$.
      Strong convergence: $\|g_t\| \mathop{=_{\epsilon_1}} 0$, then . (Here $\mathop{=_{\epsilon_1}}$ means equal with tolerance $\epsilon_1$ that Ox's fuzzy-equal comparison functions implements.)
    2. Compute the Jacobian at the new values, $J_t = Jg(x_t)$.
    3. Solve $J_ts_t = -g_t$ for $s_t$.
      Set $x_{t+1} = x_t + s_t$.
      Lack of convergence: if $J_t$ is (nearly) singular or $x_{t+1}\mathop{=_{\epsilon_2}} x_t$, then .
    4. Increment $t$ and go back to step →.

    Exhibit 39. Newton-Raphson Iteration

    For well-behaved $g(x)$, Newton-Raphson convergence is quadratic in $t$. That is, how much of the distance between the current step and the ultimate value that is closed increases as a quadratic function of the number of iterations so far.

    It is not always feasible to program analytic $J$. If numeric $J$ is used, then the number of evaluations required for each step is $1+2n$. If the cost of calculating $g(x)$ is linear in $n$ then computational cost of each step is quadratic in the size of the system $n$.

    Although Newton-Raphson converges quickly there is no guarantee that any of the Newton-Raphson step will be an improvement or that it will ever converge. The behaviour depends on $Jg$, which may be badly behaved far from a root. May be near singular and steps may diverge. An alternative discussed next is to use information from past evaluations to build up an approximation to $J_t$ without computing it. Finally, there is no systematic way to use Newton-Raphson to find all roots. Methods to attempt that harder task are not discussed here.

    Judd page 153, eq (5.2.4)
    An example of where Newton-Raphson fails to work well in one dimension is the 1x1 system $$g(x) = x^{1/3}e^{-x^2}.\nonumber$$
  3. Broyden (Approximate Jacobian).
  4. NR 9.7, page 389
    Broyden's method is a decades-old algorithm for solving a non-linear system that avoids computing the Jacobian of the system at each iteration. Instead, it builds up an approximation to the Jacobian as the iterations proceed.

    Algorithm 13. Broyden's Method

    Given $g(x)$, and $x_0\in \Re^n$.
    1. Initialize. Set $t=0$; compute $g_0 = g(x_0)$; Set $J_0$ to a (well-conditioned) $n\times n$ matrix. Typically, $J_0 = I$.
    2. Check for strong convergence: if $\|g_t\|\mathop{=_{\epsilon_1}}0$ then .
      Otherwise, solve $J_t s_t = g_t$ for $s_t$. Set $x_{t+1} = x_t - s_t$. Check for non-convergence: If $\|x_{t+1}-x_t\| \mathop{=_{\epsilon_2}}0$ possibly reset $J_t$ to $J_0$ or simply .
    3. Update Jacobian. Compute $g_{t+1}=g(x_{t+1})$ Let $y=g_{t+1}-g_t$. Then $${J_{t+1} = J_t +{(y-J_t s_t)s_t^T\over \|s_t\|}.}\nonumber$$
    4. Increment $t$ and return to 1.

    The key to the algorithm is the updating formula for the Jacobian. It is a good formula because it can be shown that it gets closer and closer to the actual Jacobian as the iteration proceeds and the process progresses. It is more robust to the initial value $x_0$ than Newton-Raphson, but still there is no guarantee of convergence. If convergence fails, restart at final values, which sometimes allows for further progress.

  5. Non-linear Systems in Ox
  6. The standard Ox package solvenle codes both Newton-Raphson and Broyden. To use it, your program must import the package. Both algorithms are available in one function:
    #import "solvenle";
    
        SolveNLE(func, ax, ...)
    
            Argument        Explanation
            ------------------------------------------------------------------------------------
            func            function evaluating the nonlinear system in the form below
            ax		        address of n x 1 matrix with starting values x0
                            final coefficients are placed in this location x*
             ...            Several optional arguments to control the algorithm (see below)
    
    
    The two-argument version of a call to SolvenLE is the basic version. Optional arguments can be sent to control the algorithm. In other words, you code the system of equations as a function which gets called by SolveNLE. The second argument should be an address of a vector (so use &) that contains the starting vector $x_0$. When SolveNLE is finished it puts the final value back in the location sent. So the starting values you send get written over with the final values by SolveNLE.
        SolveNLE(func, ax, iMode, fJac )
    
            Argument     Explanation
            ------------------------------------------------------------------------------------
            iMode        integer code to choose the algorithm
                         -1 (default)	
                             Newton-Raphson with numerical Jacobian if n < 80
                             modified Newton-Raphson for n >= 80
                          0	Newton-Raphson using ANALYTIC Jacobian computed by fJac()
                          1	Newton-Raphson using numerical Jacobian computed by SolveNLE()
                          2	Broyden's method
                          3	modified Newton-Raphson that avoids storing nxn Jacobian matrix
    
            fJac          If iMode==0 then fJac() has to return the Jacobian matrix
                          This would be useful if you have coded the analytic form of the Jacobian
                          and computing numerical Jacobians is expensive.
    
    The supplied func argument should have the following format:
    	func(ag, x)
    
            Argument        Explanation
            ------------------------------------------------------------------------------------
            ag              address for where to put the n x 1 vector value of g(x)
            x               n x 1 vector of x values
    
    	Returns
            1: successful,
            0: function evaluation failed
    
  7. Example
  8. Suppose $n=1$, a one-dimensional system and $$g(x) = \log x + x^2.\nonumber$$ We know at least one root must exist because: $g(0) = -\infty$ and $g(1) = 1$. So somewhere between 0 and 1 the function crosses the x-axis. A reasonable starting value might be $x_0=1$. Since $x\le 0$ is undefined, so we want to tell the algorithm to stay away from those values.

    Code
    25-nle-example.ox
     1:    #include "oxstd.h"
     2:    #import "solvenle"
     3:    myg(fv,x) {
     4:        if (x<=0) return FALSE;
     5:        fv[0] = log(x) + x^2;
     6:    	println("%13.8f",x~fv[0]);
     7:        return TRUE;
     8:        }
     9:    main(){
    10:        decl x = <1.0>,ccode;
    11:        ccode = SolveNLE(myg,&x);
    12:        println("Convergence code: ",ccode,"\nSolution: ",x);
    13:        }
    
    Output:
    Convergence code: 0
    Solution:
          0.65292
    

Exercises