1. Optimization
  1. Overview
  2. NR 10.0
    We focus on methods for unconstrained optimization which means to find the extreme value of a function. Optimization includes minimization and maximization. These notes consider maximization only because any minimization problem can be turned into a maximization problem by multiplying the function by negative one.

    The function to maximize is called the objective. It has to be a real-valued function. Of course functions that put out other output, such as vectors or even other functions, can be studied and useful. But it does not make sense to maximize them because that involves finding the largest output value and that implies single number outputs.

    In general a function could take any kind of input, but we will focus on functions of a vector of real numbers. The number arguments, the length of in the input vector, is $n$. This vector and the real values in them might be called arguments or choices or inputs. So formally maximization begins with an objective, $f : \Re^n \to \Re$. Let $x \in \Re^n$ be a vector of choices and $x_i$ the $i^{th}$ element of the vector. Then maximization is written as $$f^\star \quad \equiv\quad {\max}_{x}\ f(x).\tag{Max}\label{Max}$$ The right side should be read as find a value of the vector $x$ that maximizes the objective. Notice that $f^\star$ is the maximizing value of the objective. The argmax is a fancy term for an argument that does the maximizing. To be really fancy we write
    $f(x^\star) = f^\star$  ⟺  $x^\star \in {\arg\max} f(x)$.
    Note we use $\in$ instead of $=$ because there can be more than one optimal values. In general $\arg\max$ is a set of input vectors that produce the maximal value $f^\star$.

    Numerical optimization focuses on sequential optimization techniques. Before that is explained, let's consider non-sequential optimization. For example, suppose $n=1$, a one-dimensional problem and let $f(x)$ be smooth (all derivatives exist). Then, as emphasized in micro theory classes around the world, the optimal value of $x^\star$ would solve: $$f^{\,\prime}(x^\star)\ =\ \overrightarrow{0}.\qquad\qquad(FOC)$$ If (FOC) can be solved explicitly using algebra we say it has a closed-form solution. If so, then we do not need an algorithm to maximize $f(x)$. Except for some well-behaved examples (like quadratic) we can't find closed-form solutions to the maximization problem.

    Equation (FOC) might suggest that the we simply use a non-linear system solver to find the solution. This can certainly work and is recommended in some cases, but there are some reasons not to maximize this way in other cases.

    Sequential optimization is a kind of algorithm. Let's call one of these algorithms $V$. A sequential optimizer takes (at least) two inputs: the objective $f()$ and an initial guess or starting vector, $x^0$. The optimizer can be thought of as a function that produces a guess or approximation to $\arg\max$: $$x^\star_V\left(x^0\right)\quad = \quad V\left( f(), x^0 \right) \quad\approx\quad \arg\max_x\ f(x).\nonumber$$ Let's decode that statement. It says we will look at algorithms that attempt to find the maximum of a function. The algorithm produces some output vector, $x^\star_V$. We may not know that this is indeed the maximum of the function, and in many cases these algorithms will not necessarily find an optimum. It is important to be cautious about concluding that the algorithm worked, so we should consider its output an approximation to the true maximum.

  3. Local vs. Global, Continuous vs. Discontinuous
  4. In basic economics we often make assumptions that guarantee that a maximum of the objective (1) exists and (2) is unique. For example, if utility is strictly concave and increasing and the budget is linear then a unique utility maximizing consumption bundle exists. However, we often want to maximize an objective with an unknown shape (or curvature), so existence and uniqueness are not guaranteed. Most sequential algorithms only work to find a local maximum. This means $x^\star$ maximizes $f(x)$ over values of $x$ near $x^\star$, which is technically called a neighbourhood of $x^\star$. A global maximum is a local maximum in which the neighbourhood is all possible values of $x$, or everywhere. Obviously a local maximum may not be the global maximum.

    How well a sequential optimization algorithm works depends on what objective it's sent to maximize and the initial value $x_0$. No algorithm exists that reliably finds a global maximum if the there are several local maxima of the objective. Most of the algorithms we study make no attempt to distinguish a local maximum from a global one. So unless you know a priori (i.e. before you start the algorithm) that only the global maximum is also a local maximum, you should be cautious of the output of the algorithm.

    If a strictly concave and increasing function is an easy objective, what is hard one? In general, discontinuous functions cannot be maximized reliably. Continuity is a technical condition that says that the objective does not jump around. If you know your objective has such jumps then it is very likely that any algorithm you rely will fail, meaning that the output $x^\star_V$ may not be anywhere close to the true maximum $x^\star$.

Exercises