- Set Up
NR 9.0
Consider two problems. First, find an x⋆ that solves a system of equations:
g(x⋆)=→0.
Here, →0 is a vector of zeros. This is sometimes called finding the root of the system g(x). The second problem is to find an x⋆ that maximizes f(x), or
x⋆ = argmax
These are closely related but not identical tasks. For example, we know that if x^\star maximizes f(x) in \eqref{OPTf} and f(x) is smooth enough (i.e. it has continuous partial derivatives) then
g\left(x^\star\right)\ \equiv\ \nabla\,f\left(x^\star\right)^{\,\prime}\ =\ \overrightarrow{0}.\nonumber
So maximizing a function can be thought of as solving for the zeros of the gradient. However, algorithms to solve (a) and (b) are not quite the same. One way to think about why is that the system in 1 could be anything. But partial derivatives are a very special kind of system. So algorithms to maximize a function can be more efficient than applying algorithms designed to solve possibly crazy systems.
If f() or g() is a convenient form then we can algebraically solve the problem. This would be a closed-form solution. However, problems that admit closed-form solutions are very special. Most interesting problems involve systems or objectives that do not have closed-form solutions. What we discuss here are algorithms to solve (a) and (b) sequentially. That means, start with an initial or starting value, x_0 and use the calculated value of the system/objective to move closer to the solution x^\star. Typically these algorithms are sequential in the sense that the first step will produce a new guess x_1, which hopefully is better than the first guess but is unlikely to be equal to x^\star. Then apply the procedure again to x_1 to get another guess x_2. Keep doing this for until the latest guess, x_t, is close enough to x^\star for our purposes.