Don't Fence Me In♭Cole Porter 1930s: Constraining Parameter Values to Open Intervals
Many optimization problems in economics are constrained, meaning that the parameters cannot take on any real value. These constraints come from economic concepts, such as budget constraints, technology frontiers, etc. They also come from logical and mathematical necessities. For example, probabilities must be numbers between 0 and 1. If you maximize an objective that includes choosing a probability and the result is a value outside this range it is meaningless. Consider the problem: $${\max}_{0 \lt x \lt 1} \quad 2\ln (x) + 3\ln(1-x).\nonumber$$ This is choosing $x$ over the open interval $(0,1)$. If we try to evaluate the objective for $x<0$ or $x>1$ we are trying to take the log of a negative number, which is undefined. So the problem with using unconstrained methods is that they won't know this and may try to evaluate the objective for values outside the well-defined range. However, there is atrickthat can be pulled on unconstrained optimization algorithms. First, a little bit of calculus. Remember the chain rule:
Table 12. Transformations for Constraints on Parameters
Concept | Constraint on $x$ | $h$ Transformation | Inverse |
---|---|---|---|
Positive | $x>0$ | $x=e^y$ | $y_0 = \ln x_0$ |
Negative | $x<0$ | $x=-e^y$ | $y_0 = \ln(-x_0)$ |
Probability | $0\lt x \lt 1$ | $x={e^y\over 1+e^y}$ | $y_0 = \ln\left(\,x_0 / (1-x_0)\,\right)$ |
Bounded | $L\lt x\lt U$ | $x=L+(U-L){e^y \over 1+e^y}$ | $y_0=\ln\left(\, (x_0-L) / (U-L-x_0)\,\right)$ |
Constrained Optimization
Transformation of variables can handle many kinds of constrained optimization problems, but there are important problems when it is not possible to specify an open interval ahead of time. You should recall that we solve constrained optimization problems by forming a Lagrangian objective $${\cal L}(x,\lambda) = f(x) - \lambda \circ g(x).\tag{L}\label{L}$$ This is more general than you are probably used to. First, $f(x)$ is the objective and $x$ is the choice vector, just as before. The function $g(x)$ is a system of equations that represent constraints on the choice of $x$. You are perhaps used to imposing a budget constraint: $$p_0 x_0 + p_1 x_1 = m.\nonumber$$ The choice of $x$ has to be affordable. To work in our notation we have to change this equation into something we want the choice vector to set to 0. That is, the budget constraint is satisfied if expenditure equals income: $$p_0 x_0 + p_1 x_1 - m = 0.\tag{Budget1}\label{Budget1}$$ So now we can see that in this case $g(x) = p_0 x_0 + p_1 x_1 - m$ and we have to maximize $f(x)$ subject to $g(x)=\overrightarrow{0}$, where $\overrightarrow{0}$ is a vector of 0s. What is different is that we may have multiple constraints, so that $g(x)$ is a system of $r$ equations in $n$ unknowns. If we have as many constraints as choices we will not have any choice left! So $r \lt n$ is the usual problem. A single budget constraint is the case when $r-1$ and $g(x)$ is linear in $x$. You may also recall that $\lambda$ is the Lagrange multiplier on the constraint. We need a multiplier on each constraint, so in general $\lambda$ is a $r\times 1$ vector of multipliers. And $\lambda \circ g(x)$ is the inner product of the two vectors, $\lambda \circ g(x) = \sum_{i=1}^r \lambda_i g_i(x)$. We are now going to farther than you are used to by allowing for inequality constraints. $$h(x)\,\ge\,\overrightarrow{0}.\tag{Ineq}\label{Ineq}$$ In other words, what is required is that values of the equations in the system $h(x)$ are non-negative. The set of choice vectors that satisfy this condition is the feasible region. Of course, budget constraints can be considered inequality constraints too, but Lagrangian problems are much easier when we know that the solution will be an equality. The reason to allow both equality and inequality constraints is as follows: with multiple constraints and possibly objectives that are not always increasing we may want to be inside the feasible region not on the edge where $h_i(x)=0$. The most general problem is then one that combines both equality and inequality constraints:Definition 3. A general Kuhn-Tucker Optimization Problem
$${\cal L}(x,\lambda) = f(x) - \lambda \circ g(x) - \gamma \circ h(x).\tag{Kuhn}\label{Kuhn}$$
Inequality constraints are more complicated to handle than equality constraints. Suppose there is only one inequality constraint, so $h(x)$ is a single equation. Suppose at $x^o$ $h(x^o) = 3$, meaning the constraint is not binding on the margin at $x^o$. Near $x^o$ $h(x)$ remains positive and improvements in $f(x)$ can be pursued as if the problem was unconstrained by $h$. If it turns out that $x^\star = x^o$ then there is no cost to imposing $h$, on the margin. This means that the correct penalty in the Lagrangian should be 0. But we can't know that ahead of time.
On the other hand, suppose $h(x^d)=0$ and $h$ is binding at $x^d$. The objective may improve in a direction from $x^d$ for which $h(x)$ turns negative. If $x^\star=x^d$ then the penalty on $h()$ reflects this cost just like $\lambda$ in the Lagrangian.
So either the penalty or the value of the constraint is zero. This is called complementary slackness. But which constraints bind cannot be predetermined. And since the constraints are either on or off, there are many combinations to consider. With $s$ constraints there are $2^s$ different sets of possible active constraints.
Algorithm 19. Kuhn-Tucker Programming
- Initialize ${\cal A}_0$, the first guess of the active constraint set. Set $t=0$.
- Set $\gamma^\star_j=0$ for $j\in {\cal A}_t^c$. Solve $x^\star_t$ and the remaining elements of $\gamma^\star$ as a Lagrangian problem for the constraints $j \in {\cal A}_t$. Compute the value of all constraints at $x^\star_t$, $h(x^\star_t)$.
- If $j\in {\cal A}_t^c$ and $h^j(x^\star_t)\le 0$ then move $j$ to ${\cal A}_{t+1}$ and now consider $\gamma_j$ free. If $j\in {\cal A}_t$ and $\gamma^\star_j\le 0$ then move $j$ out of ${\cal A}_{t+1}$ and set $\gamma_j=0$.
- If ${\cal A}_t = {\cal A}_{t+1}$ then (IEC), (CS) and (POS) are all satisfied at $x^\star_t$ and $\gamma^\star$. Stop iterating. (EXIT this recipe)
- Otherwise, one or more constraints were misclassified. This may make $x^\star_t$ either infeasible or suboptimal, and subsequent changes may affect other constraints that were conditionally correct. Increment $t$ and return to S1.
Sequential Quadratic Programming
Judd 4.7 The idea of Sequential Quadratic Programming (SQP) is an extension of sorts to Newton's method for unconstrained optimization. Recall that we can think of Newton as optimizing a quadratic approximation to the objective using the gradient and Hessian information. SQP takes a quadratic approximation to the Lagrangian to account for the penalties. It takes linear approximations to the constraints themselves. So it maximizes a quadratic objective subject to linear constraints (both equality and inequality if applicable). Special methods are able to solve this kind of problem using polynomial solution methods. First, suppose the algorithm is currently at some point $x^t$. At the same time we have to specify some current values of the multiplier vectors, $\lambda^t$ and $\gamma^t$. Some of the inequality multipliers might be 0, meaning those constraints are currently inactive. The Hessian of the Lagrangian is $$H_{xx}{\cal L}(x,\lambda,\gamma) = \left[ {\partial^2 {\cal L} \over \partial x_i \partial x_j }\right] = Hf(x) - H[\lambda\circ g(x)] - H[\gamma\circ h(x)].\tag{HL}\label{HL}$$ This is a $n\times n$ matrix (the number of parameters being maximized over). Although the constraints are systems of equations they areaggregatedinto a single number by the inner product with the multipliers, so the penalties themselves have symmetric Hessian matrices. The multiplier vectors $\lambda$ and $\gamma$ enter the matrix through the penalties but we are only differentiating with respect to the choice variables not the penalties. The constraint systems $g$ and $h$ have Jacobian matrices, $Jg(x)$ and $Jh(x)$, which are $r\times n$ and $p\times n$, respectively. Then the quadratic objective with linear constraints can be written: $$\eqalign{ {\max}_{s\in \Re^n} \quad (x^t-s)' &H_{xx}{\cal L}\left(x^t,\lambda^t,\mu^t\right) (x^t-s)\cr &Jg(x^t)(x^t-s) = \overrightarrow{0}&(LQ)\cr &Jh(x^t)(x^t-s) \le \overrightarrow{0}\cr }\tag{SQP}\label{SQP}$$
Algorithm 20. SQP Iteration
- Pick initial guesses $x^0$, $\lambda^0$, and $\mu^0$. Set $t=0$.
- At iteration $t$, compute $H_{xx}{\cal L}$, $Jg$ and $Jh$ to form the linear-quadratic approximation in equation (LQ). There are closed-form solutions to that problem which determine a step-size, $s^\star$ and a new set of multipliers. If $\| s^\star \| \mathop{=_\epsilon} 0$ then (strong convergence).
- Update $x^{t+1}=x^t+s^\star$. Then update $\lambda^{t+1}$ and $\mu^{t+1}$. Increment $t=t+1$.
#import "maxsqp" MaxSQP(func, ax, af, aH, useNH, ineq, eq, Lo, Hi) Argument Explanation ------------------------------------------------------------------------------------ func objective (in same form as other Max routines require) ax address of starting values and where to place final values af address to place final objective value aH 0 or address of where to place final Hessian useNH 1=use numerical first derivatives; 0=func provides them when asked ineq h(x), system of INequality constraints, in form asked by SolveNLE eq g(x), system of Equality constraints, in form asked by SolveNLE Lo empty matrix or nx1 vector of lower bounds on x Hi empty matrix or nx1 vector of upper bounds on x Returns Convergence code like MaxBFGS