Loading [MathJax]/jax/output/CommonHTML/jax.js

  1. Linear Systems
  1. Love Triangles
  2. Before moving to a general approach to solving a linear system let's think about a special case that remains almost as easy as the scalar case. Consider this matrix: A=(100230456). Now form a linear system with it: (100230456)(xyz)=(140). Visually we can see an easy way to solve this problem: The first equation (first row) only depends on x so it is just a scalar system: x=1/1=1. Next, once we have determined x, we use it in the second row which now becomes a scalar equation in y: 2(1)+3y=4 or y=6/3=2. Now we can move onto the third row and solve for z: 4(1)+5(2)+6z=0 or z=(0+410)/6=1.

    So we have extended the simple scalar case of dividing the constant by the coefficient to solve a N dimensional system as long as it is triangular. That is, the algorithm subtracts (weighted) values already solved from the current constant term then divides by the current diagonal element. When will this process fail? Under the same condition as the scalar system: when it calls for dividing by 0. The terms that are divided are the diagonal elements of the matrix. So if any of the aii elements are 0 the triangular system is singular. If they are all different from 0 the algorithm will work, the system can be inverted and the determinant of A is not zero.

    In fact, you might remember that if A is lower triangular its determinant is simply the product of its diagonal elements: A triangular |A|=Nn=1ann. Remember that we are talking about computing the solution with numbers stored as floating point reals (FPRs). If any aii=0 exactly then we have a symbolically singular triangular matrix. But if aii is not exactly zero but is a FPR close enough to 0.0 we get overflow: a result that is too big to store given the precision of the FPRs. We could test for near 0 using Ox's feq(A[n][n],0.0) and if this is true the algorithm can stop and report a (numerically) 0 determinant. (The real algorithm can do better than just stopping.)

    We can generalize what we can see for the 3x3 matrix above.

    Algorithm 3. Solving a Lower Triangular Systems.

    Suppose A is a N×N  lower triangular matrix and we wish to solve Ax=b for x.
    This requires a sequence of forward substitutions x1=b1/a11x2=b2a21x1a22xN=bNSNaNN
    At row n, the term Sn=n1i=1anixi is the sum of values of earlier rows substituted into the current nth equation. When n=1 the sum is simply S1=0.
    The elements of x are computed by looping over rows of A, and at each row the sum Sn loops over columns of the current row up to n.
    This algorithm includes a nested loop if we express it in terms of scalar operations. The inner loop might be masked by vector operations in a language like Ox, but those operations are implemented by loops in the C code that runs Ox.

    The same outer loop over rows can compute the determinant of A. Initialize it as D=1. Then for each n check if ann is numerically zero. If not, update D=ann and solve for xn using the expression above. Suppose one of the diagonal elements is numerically 0. A simple function could simply set the determinant D=0 and stop solving, telling the program who called the function the system is singular system. More sophisticated algorithms (like the built-in Ox routines) can do something else than just quit.

    To solve a triangular system requires N divide operations and N(N1) multiplies of real numbers. It also requires N(N1) subtractions, but subtraction takes much less computing time than multiplicative operations. Most of the cost of algebraic operations is spent multiplying and dividing. So a total of N+N(N1)=N2 FLOPS are required to solve a triangular system.

    On the other hand, suppose A is upper triangular. The same idea works in reverse: start with the last row (equation) and solve backwards. Then solving for x only requires a sequence of back substitutions and N2 operations: If A factors into the product of two triangular matrices, and the two matrices are already known, then solving the system requires 2N2 operations.

    Instead of starting with the first row/equation, begin with the last row and solve for xN=bN/aNN. Then move up one row substituting the solved value for xN into the N1 equation. Keep going until n=1 is completed.

    Algorithm 4. Solving Upper Triangular Systems.

    Suppose A is a N×N  upper triangular matrix and we wish to solve Ax=b for x.
    This requires a sequence of back substitutions a11a12a1n0a22a230000annx=bx1=b1ni=2a1ixia11xn1=bn1an1nxnan1n1xn=bn/ann.
    To sum up, solving triangular linear systems is not conceptually difficult, and code to do so takes a few lines in a matrix-oriented language like Ox.
  3. Beethoven in his grave … Decomposing
  4. So we have simple algorithms to solve any triangular system. Now Suppose A=LU, where L is lower and U is upper . Then Ly=by=L1bUx=yx=U1y=U1L1b Now, to consider the general problem consider first the trivial case when N=1 (one equation in one unknown). Then we have scalars Ax=b and therefore x=b/A. It should be clear that any non-zero scalar A can be written as the product of two non-zero numbers, call them L and U. Indeed, in the scalar case we can simply normalize L=1 and define U=A. When N=1 there is a decomposition of A into two triangular "matrices". If, on the other hand, A=0, then U=0 and it is not possible to factor the matrix and then solve, since this would involve dividing by 0.

    Once we let N be greater than 1 it is not so obvious that A can be factored into triangular matrices or how to go about doing it. However, the idea that elements of L can still be normalized might be intuitively clear. There are N(N+1)/2 non-zero elements in a triangular matrix. Two such matrices have a total of N(N+1)=N2+N elements, but the original A matrix has only NN elements. So we have N more degrees of freedom (free values) than we need for the factorization. One way to reduce this freedom is to normalize N elements of the two triangles. Following the scalar example, we can normalize the diagonal of L to consist of 1s. Then together there are exactly N2 free elements in L and U to just match the elements in A.

    Fortunately, for the case of N>1 it has been known for decades that if A is non-singular then we can indeed find matrices L and U for it. We will not discuss how the algorithm for computing the L and U works (it's tedious) but let's be clear what it does.

    Algorithm 5. L-U Decomposition & solving Ax=b & Inversion

    NR 2.3; Judd II.3
    If |A|0 then:
    1. L,U:A=LU and L is lower , U is upper , and L has 1s on its diagonal.
    2. Since the diagonal of L is normalized |A|=|LU|=|L||U|=1×|U|=i=1,Nuii.
    3. The matrices L and U for A can be found numerically using an algorithm that fails only if |A|=0. This will happen in the process at some row i when uii is too close to 0. At that point the L-U decomposition will stop because it has discovered that the determinant of A is 0. Thus, instead of computing the determinant in order to solve a system of equations (as taught in math econ courses), computer algorithms compute the determinant as a by-product of solving the system.
    4. The solution is numerically more accurate if permutations of the rows of A are allowed. That is, a permutation matrix P is the identity matrix I with some of it is rows swapped. The L-U algorithm will then produce 3 matrices such that A=PLU. However, note Ox's implements this differently

    In Ox's implementation of this algorithm, the job of telling the user how the rows of A are rearranged is done with a vector of integers. The earlier section on fancy indexing explained how a vector can be used to index a matrix. For example, suppose B is a 2x5 matrix. Then B[<1;0>][] swaps is the 2x5 matrix with the rows of B swapped. That syntax is how Ox reports back the permutation:

    The built-in Ox function for carrying out the LU Decomposition is:

    declu(A, aL, aU, aP)
    
            Argument        Explanation
            ------------------------------------------------------------------------------------
            A               the (square) matrix to decompose
            aL              an address (pointer) where to put L
            aU              an address (pointer) where to put U
            aP              an address (pointer) where to put a permutation matrix P,
    
            On output
            P contains a 2xN matrix containing integers.  The first row is the order of rows of LU to get back to the original rows of A.
                That is, A = (L*U)[ P[0][] ][] 
    
      Returns
        1:  if no error occurred, and the L, U, P matrices are reliable.
        2:  if the decomposition could be unreliable (because A is near to singular)
        0:  if the LU decomposition failed and the matrix A is (numerically) singular.
                       
    
    How does a program using declu() know if the decomposition was successful? By looking at the return value of declu(). If your program has called declu(A,&L,&U,&P) and it returns a 1 then we know we now solve a system of the form Ax=b with back- and forward-substitution, which Ox's solvelu() will do for you:
        decl L, U, P, ok, A, b;
    
        // compute or load A and b
    
        ok = declu(A, &L, &U, &P);
        if (ok>0) {
            x = solvelu(L,U,P,b);
            println(" x = ",x);
            }
        else println("A is singular, cannot solve for x");
    

    One key implication of solving linear systems using decompositions is that it does not require computing A1, although if A1 is already available it can be used to do the job. Since LU depends only on A, the decomposition can be used to solve for different b's. In fact, suppose you 5 different b vectors to solve Ax=b. Then you can put the column vectors into a matrix B. The back/forward substitution process can be used to solve a matrix equation AX=B. Each column of X will correspond to the vector solution for the corresponding column of B.

    Suppose A is N×N. Now think of applying the idea in the previous point where B=I, the identity matrix of order N. We can get A1 by solving AX=I. So solving a single linear system by first computing A1 involves n1 unnecessary solutions. So, in symbolic linear algebra we imagine finding a matrix inverse to solve a system of equations. But a computer would typically find the inverse by solving N linear systems.

    There are many cases when A1 is needed for its own sake. Or you have to solve for many vectors b (more than N different ones). Then it makes sense to compute A1 and then use it solve the linear systems. Computing A1b is quicker than back/forward solving once. So use L-U to compute A1 then store for other calculations or use it for many b vectors. Built-in Ox routines will call declu() and solvelu() for you:
    invert(A)
    
        Returns
            inverse of A,
            0 if decomposition failed.
    
    invert(A,aLogDet,aSign)
    
            Argument     Explanation
            ------------------------------------------------------------------------------------
            aLogDet      address to put log( |det(A)| ), the logarithm of absolute value of |A|
            aSign        address to put a sign code for |A|:
                                0: singular
                            -1,-2: negative determinant
                            +1,+2: positive determinant
    
                            either -2 or +2 indicate the result is unreliable.
    

    Algorithm 6. A General Algorithm to Solve Ax=b & Inversion of a Matrix

    1. Use the L-U decomposition to compute P, L and U for A. If this fails then A is not invertible. or handle this situation some other way.
    2. Forward solve Ly=b for y.
    3. Backward solve Uz=y for z.
    4. Rearrange rows: x=Pz.
  5. Non-square and Singular Systems: Singular Value Decomposition
  6. Besides the L-U decomposition there are other procedures for square matrices (the QR decomposition and the Cholesky decomposition for symmetric matrices discussed elsewhere). When it comes to non-square matrices other algorithms also become important. So suppose B is a N×M matrix and NM. We know that if NM the inverse of B is not defined and the square triangular L-U decomposition is not defined. However, there are notions of "generalized inverses" of non-square matrices which we do not discuss. And there is an important decomposition method we will define, the SVD or singular value decomposition: B=UWV. Here U is a N×M matrix (same dimensions of B), W is a square M×M diagonal matrix, and V is a M×M matrix (notice the transpose in the equation). Since W is diagonal it has M elements on the diagonal denoted wi. These are known as the singular values of B. Further, the matrices U and V are normalized so that UU=VV=IM. This says that columns of U can be constructed to be orthogonal to each other. The same for the columns of V. In the special case of B square (N=M) we could write it's inverse as B1=Vdiag(1/w11/w21/wN)U. This relies on the fact that the inverse of an orthogonal matrix is its transpose (UU=I.) This matrix will exist as long none of the singular values are exactly 0. Otherwise it is not defined because the diagonal matrix has an infinite value. So having computed the SVD of B you can look at the ws and determine how "close" to singular it is.

    When N>M you can use the SVD to compute the "least squares" solution to Bx=b. That is, there are more equations than unknowns and you want to find an x vector that as close as possible to solving the system. x=Vdiag(1/w11/w21/wN)Ub. You can see directly that if it is a square system this gets back to x=B1b. Otherwise, this x minimizes the Euclidean distance between Bx and b. In regression analysis this is another way to rewrite the OLS estimates: by, ˆβolsx and the SVD component is derived from X and equates to (XX)1X.

Exercises