1. Linear Systems
  1. Love Triangles
  2. Before moving to a general approach to solving a linear system let's think about a special case that remains almost as easy as the scalar case. Consider this matrix: $$A = \l(\matrix{ 1 & 0 & 0 \cr 2 & 3 &0\cr 4 & 5 & 6\cr}\r).\nonumber$$ Now form a linear system with it: $$\l(\matrix{ 1 & 0 & 0 \cr 2 & 3 &0\cr 4 & 5 & 6\cr}\r)\l(\matrix{x\cr y\cr z}\r) = \l(\matrix{-1 \cr 4 \cr 0}\r).\nonumber$$ Visually we can see an easy way to solve this problem: The first equation (first row) only depends on $x$ so it is just a scalar system: $x = -1/1 = -1$. Next, once we have determined $x$, we use it in the second row which now becomes a scalar equation in $y$: $2(-1) + 3y = 4$ or $y = 6/3 = 2$. Now we can move onto the third row and solve for $z$: $4(-1)+5(2) + 6z = 0$ or $z = (0+4-10)/6 = -1.$

    So we have extended the simple scalar case of dividing the constant by the coefficient to solve a $N$ dimensional system as long as it is triangular. That is, the algorithm subtracts (weighted) values already solved from the current constant term then divides by the current diagonal element. When will this process fail? Under the same condition as the scalar system: when it calls for dividing by 0. The terms that are divided are the diagonal elements of the matrix. So if any of the $a_{ii}$ elements are 0 the triangular system is singular. If they are all different from 0 the algorithm will work, the system can be inverted and the determinant of $A$ is not zero.

    In fact, you might remember that if $A$ is lower triangular its determinant is simply the product of its diagonal elements: $$A \hbox{ triangular } \quad \rightarrow\quad |A| = \prod_{n=1}^N a_{nn}.\nonumber$$ Remember that we are talking about computing the solution with numbers stored as floating point reals (FPRs). If any $a_{ii}=0$ exactly then we have a symbolically singular triangular matrix. But if $a_{ii}$ is not exactly zero but is a FPR close enough to 0.0 we get overflow: a result that is too big to store given the precision of the FPRs. We could test for near 0 using Ox's feq(A[n][n],0.0) and if this is true the algorithm can stop and report a (numerically) 0 determinant. (The real algorithm can do better than just stopping.)

    We can generalize what we can see for the 3x3 matrix above.

    Algorithm 3. Solving a Lower Triangular Systems.

    Suppose $A$ is a $N\times N\ $ lower triangular matrix and we wish to solve $$Ax = b\nonumber$$ for $x$.
    This requires a sequence of forward substitutions $$\matrix{& x_1 &= b_1/a_{11} \cr \downarrow&x_2 &= {b_2-a_{21}x_1 \over a_{22}} \cr \downarrow & \vdots &\vdots \cr \downarrow& x_N &= {b_N-S_N \over a_{NN} }\cr}\nonumber$$
    At row $n$, the term $$S_n = \sum_{i=1}^{n-1} a_{ni}x_{i}\nonumber$$ is the sum of values of earlier rows substituted into the current $n^{th}$ equation. When $n=1$ the sum is simply $S_1=0$.
    The elements of $x$ are computed by looping over rows of $A$, and at each row the sum $S_n$ loops over columns of the current row up to $n$.
    This algorithm includes a nested loop if we express it in terms of scalar operations. The inner loop might be masked by vector operations in a language like Ox, but those operations are implemented by loops in the C code that runs Ox.

    The same outer loop over rows can compute the determinant of A. Initialize it as $D=1$. Then for each $n$ check if $a_{nn}$ is numerically zero. If not, update $D *= a_{nn}$ and solve for $x_{n}$ using the expression above. Suppose one of the diagonal elements is numerically $0$. A simple function could simply set the determinant $D=0$ and stop solving, telling the program who called the function the system is singular system. More sophisticated algorithms (like the built-in Ox routines) can do something else than just quit.

    To solve a triangular system requires $N$ divide operations and $N(N-1)$ multiplies of real numbers. It also requires $N(N-1)$ subtractions, but subtraction takes much less computing time than multiplicative operations. Most of the cost of algebraic operations is spent multiplying and dividing. So a total of $N+N(N-1) = N^{2}$ FLOPS are required to solve a triangular system.

    On the other hand, suppose $A$ is upper triangular. The same idea works in reverse: start with the last row (equation) and solve backwards. Then solving for $x$ only requires a sequence of back substitutions and N$^{2}$ operations: If $A$ factors into the product of two triangular matrices, and the two matrices are already known, then solving the system requires $2N^{2}$ operations.

    Instead of starting with the first row/equation, begin with the last row and solve for $x_N = b_{N} / a_{NN}$. Then move up one row substituting the solved value for $x_N$ into the $N-1$ equation. Keep going until $n=1$ is completed.

    Algorithm 4. Solving Upper Triangular Systems.

    Suppose $A$ is a $N\times N\ $ upper triangular matrix and we wish to solve $$Ax = b\nonumber$$ for $x$.
    This requires a sequence of back substitutions $$\matrix{a_{11}&a_{12}&\ldots&a_{1n}\cr 0&a_{22}&a_{23}&\ldots\cr 0&0&\ddots&\ldots\cr 0&\ldots&0&a_{nn}\cr}x = b \quad\Rightarrow\quad \matrix{& x_1 &=& {b_1-\sum_{i=2}^{n} a_{1i}x_{i}\over a_{11} }\cr \uparrow &\vdots \cr \uparrow& x_{n-1} &=&{b_{n-1}-a_{n-1 n}x_n \over a_{n-1 n-1}} \cr \uparrow& x_n &=& b_n/a_{nn}. \cr }\nonumber$$
    To sum up, solving triangular linear systems is not conceptually difficult, and code to do so takes a few lines in a matrix-oriented language like Ox.
  3. Beethoven in his grave … Decomposing
  4. So we have simple algorithms to solve any triangular system. Now Suppose $A = LU$, where $L$ is lower $\triangle$ and $U$ is upper $\triangle$. Then $$Ly = b \quad\Rightarrow\quad y = L^{-1}b \qquad Ux = y \quad\Rightarrow\quad x = U^{-1}y = U^{-1}L^{-1}b\nonumber$$ Now, to consider the general problem consider first the trivial case when $N=1$ (one equation in one unknown). Then we have scalars $Ax = b$ and therefore $x = b/A$. It should be clear that any non-zero scalar A can be written as the product of two non-zero numbers, call them $L$ and $U$. Indeed, in the scalar case we can simply normalize $L=1$ and define $U=A$. When $N=1$ there is a decomposition of $A$ into two triangular "matrices". If, on the other hand, $A=0$, then $U=0$ and it is not possible to factor the matrix and then solve, since this would involve dividing by 0.

    Once we let $N$ be greater than 1 it is not so obvious that $A$ can be factored into triangular matrices or how to go about doing it. However, the idea that elements of $L$ can still be normalized might be intuitively clear. There are $N(N+1)/2$ non-zero elements in a triangular matrix. Two such matrices have a total of $N(N+1) = N^2+N$ elements, but the original $A$ matrix has only $NN$ elements. So we have $N$ more degrees of freedom (free values) than we need for the factorization. One way to reduce this freedom is to normalize $N$ elements of the two triangles. Following the scalar example, we can normalize the diagonal of $L$ to consist of 1s. Then together there are exactly $N^2$ free elements in L and U to just match the elements in A.

    Fortunately, for the case of $N>1$ it has been known for decades that if $A$ is non-singular then we can indeed find matrices $L$ and $U$ for it. We will not discuss how the algorithm for computing the $L$ and $U$ works (it's tedious) but let's be clear what it does.

    Algorithm 5. L-U Decomposition & solving $Ax=b$ & Inversion

    NR 2.3; Judd II.3
    If $|A|\ne 0$ then:
    1. $\exists L, U: A=LU$ and $L$ is lower $\triangle$, $U$ is upper $\triangle$, and $L$ has 1s on its diagonal.
    2. Since the diagonal of $L$ is normalized $|A| = |LU| = |L| |U| = 1\times |U| = \prod_{i=1,\dots N} u_{ii}.$
    3. The matrices $L$ and $U$ for $A$ can be found numerically using an algorithm that fails only if $|A|=0$. This will happen in the process at some row $i$ when $u_{ii}$ is too close to 0. At that point the L-U decomposition will stop because it has discovered that the determinant of $A$ is 0. Thus, instead of computing the determinant in order to solve a system of equations (as taught in math econ courses), computer algorithms compute the determinant as a by-product of solving the system.
    4. The solution is numerically more accurate if permutations of the rows of $A$ are allowed. That is, a permutation matrix $P$ is the identity matrix $I$ with some of it is rows swapped. The L-U algorithm will then produce 3 matrices such that $A = PLU$. However, note Ox's implements this differently

    In Ox's implementation of this algorithm, the job of telling the user how the rows of $A$ are rearranged is done with a vector of integers. The earlier section on fancy indexing explained how a vector can be used to index a matrix. For example, suppose B is a 2x5 matrix. Then B[<1;0>][] swaps is the 2x5 matrix with the rows of B swapped. That syntax is how Ox reports back the permutation:

    The built-in Ox function for carrying out the LU Decomposition is:

    declu(A, aL, aU, aP)
    
            Argument        Explanation
            ------------------------------------------------------------------------------------
            A               the (square) matrix to decompose
            aL              an address (pointer) where to put L
            aU              an address (pointer) where to put U
            aP              an address (pointer) where to put a permutation matrix P,
    
            On output
            P contains a 2xN matrix containing integers.  The first row is the order of rows of LU to get back to the original rows of A.
                That is, A = (L*U)[ P[0][] ][] 
    
      Returns
        1:  if no error occurred, and the L, U, P matrices are reliable.
        2:  if the decomposition could be unreliable (because A is near to singular)
        0:  if the LU decomposition failed and the matrix A is (numerically) singular.
                       
    
    How does a program using declu() know if the decomposition was successful? By looking at the return value of declu(). If your program has called declu(A,&L,&U,&P) and it returns a 1 then we know we now solve a system of the form $Ax = b$ with back- and forward-substitution, which Ox's solvelu() will do for you:
        decl L, U, P, ok, A, b;
    
        // compute or load A and b
    
        ok = declu(A, &L, &U, &P);
        if (ok>0) {
            x = solvelu(L,U,P,b);
            println(" x = ",x);
            }
        else println("A is singular, cannot solve for x");
    

    One key implication of solving linear systems using decompositions is that it does not require computing $A^{-1}$, although if $A^{-1}$ is already available it can be used to do the job. Since $L-U$ depends only on $A$, the decomposition can be used to solve for different $b$'s. In fact, suppose you 5 different $b$ vectors to solve $Ax=b$. Then you can put the column vectors into a matrix $B$. The back/forward substitution process can be used to solve a matrix equation $AX=B$. Each column of $X$ will correspond to the vector solution for the corresponding column of $B$.

    Suppose $A$ is $N\times N$. Now think of applying the idea in the previous point where $B = I$, the identity matrix of order $N$. We can get $A^{-1}$ by solving $AX = I$. So solving a single linear system by first computing $A^{-1}$ involves $n-1$ unnecessary solutions. So, in symbolic linear algebra we imagine finding a matrix inverse to solve a system of equations. But a computer would typically find the inverse by solving $N$ linear systems.

    There are many cases when $A^{-1}$ is needed for its own sake. Or you have to solve for many vectors $b$ (more than $N$ different ones). Then it makes sense to compute $A^{-1}$ and then use it solve the linear systems. Computing $A^{-1}b$ is quicker than back/forward solving once. So use L-U to compute $A^{-1}$ then store for other calculations or use it for many $b$ vectors. Built-in Ox routines will call declu() and solvelu() for you:
    invert(A)
    
        Returns
            inverse of A,
            0 if decomposition failed.
    
    invert(A,aLogDet,aSign)
    
            Argument     Explanation
            ------------------------------------------------------------------------------------
            aLogDet      address to put log( |det(A)| ), the logarithm of absolute value of |A|
            aSign        address to put a sign code for |A|:
                                0: singular
                            -1,-2: negative determinant
                            +1,+2: positive determinant
    
                            either -2 or +2 indicate the result is unreliable.
    

    Algorithm 6. A General Algorithm to Solve $Ax=b$ & Inversion of a Matrix

    1. Use the L-U decomposition to compute $P$, $L$ and $U$ for $A$. If this fails then $A$ is not invertible. or handle this situation some other way.
    2. Forward solve $Ly=b$ for $y$.
    3. Backward solve $Uz = y$ for $z$.
    4. Rearrange rows: $x = Pz$.
  5. Non-square and Singular Systems: Singular Value Decomposition
  6. Besides the L-U decomposition there are other procedures for square matrices (the QR decomposition and the Cholesky decomposition for symmetric matrices discussed elsewhere). When it comes to non-square matrices other algorithms also become important. So suppose $B$ is a $N\times M$ matrix and $N\ge M$. We know that if $N\ne M$ the inverse of $B$ is not defined and the square triangular L-U decomposition is not defined. However, there are notions of "generalized inverses" of non-square matrices which we do not discuss. And there is an important decomposition method we will define, the SVD or singular value decomposition: $$B = UWV'.\tag{SD}\label{SVD}$$ Here $U$ is a $N\times M$ matrix (same dimensions of $B$), $W$ is a square $M\times M$ diagonal matrix, and $V$ is a $M\times M$ matrix (notice the transpose in the equation). Since $W$ is diagonal it has $M$ elements on the diagonal denoted $w_i.$ These are known as the singular values of $B.$ Further, the matrices $U$ and $V$ are normalized so that $$U'U = V'V = I_M.$$ This says that columns of $U$ can be constructed to be orthogonal to each other. The same for the columns of $V$. In the special case of $B$ square ($N=M$) we could write it's inverse as $$B^{-1} = V\hbox{diag}\pmatrix{1/w_1 & 1/w_2 & \cdots & 1/w_N}U'.$$ This relies on the fact that the inverse of an orthogonal matrix is its transpose ($UU'=I$.) This matrix will exist as long none of the singular values are exactly 0. Otherwise it is not defined because the diagonal matrix has an infinite value. So having computed the SVD of $B$ you can look at the $w's$ and determine how "close" to singular it is.

    When $N>M$ you can use the SVD to compute the "least squares" solution to $Bx = b.$ That is, there are more equations than unknowns and you want to find an $x$ vector that as close as possible to solving the system. $$x = V\hbox{diag}\pmatrix{1/w_1 & 1/w_2 & \cdots & 1/w_N}U'b.$$ You can see directly that if it is a square system this gets back to $x = B^{-1}b.$ Otherwise, this $x$ minimizes the Euclidean distance between $Bx$ and $b.$ In regression analysis this is another way to rewrite the OLS estimates: $b \equiv y,$ $\hat\beta^{ols} \equiv x$ and the SVD component is derived from $X$ and equates to $(X'X)^{-1}X'.$

Exercises