- Set Up
NR 9.0
Consider two problems. First, find an $x^\star$ that solves a system of equations:
$$ g(x^\star) = \overrightarrow{0}.\tag{NLSg}\label{NLSg}$$
Here, $\overrightarrow{0}$ is a vector of zeros. This is sometimes called finding the root of the system $g(x)$. The second problem is to find an $x^\star$ that maximizes $f(x)$, or
$$x^\star\ =\ {\arg\max}\ f(x).\tag{OPTf}\label{OPTf}$$
These are closely related but not identical tasks. For example, we know that if $x^\star$ maximizes $f(x)$ in \eqref{OPTf} and $f(x)$ is smooth enough (i.e. it has continuous partial derivatives) then
$$g\left(x^\star\right)\ \equiv\ \nabla\,f\left(x^\star\right)^{\,\prime}\ =\ \overrightarrow{0}.\nonumber$$
So maximizing a function can be thought of as solving for the zeros of the gradient. However, algorithms to solve (a) and (b) are not quite the same. One way to think about why is that the system in 1 could be anything. But partial derivatives are a very special kind of system. So algorithms to maximize a function can be more efficient than applying algorithms designed to solve possibly crazy systems.
If $f()$ or $g()$ is a convenient form then we can algebraically solve the problem. This would be a closed-form solution. However, problems that admit closed-form solutions are very special. Most interesting problems involve systems or objectives that do not have closed-form solutions. What we discuss here are algorithms to solve (a) and (b) sequentially. That means, start with an initial or starting value, $x_0$ and use the calculated value of the system/objective to move closer to the solution $x^\star$. Typically these algorithms are sequential in the sense that the first step will produce a new guess $x_1$, which hopefully is better than the first guess but is unlikely to be equal to $x^\star$. Then apply the procedure again to $x_1$ to get another guess $x_2$. Keep doing this for until the latest guess, $x_t$, is close enough to $x^\star$ for our purposes.