A Quick Start FiveO by Example.
FiveO treats the problems of optimization and system solving (root finding) as nearly the same. This document provides examples of both.
The first line says this Ox code relies on FiveO. It could also #import niqlow since it includes FiveO.
struct Rosenbrock : BlackBox { … }
The next five lines declare the class of the objective, named Rosenbrock. This is a BlackBox objective, so we derive Rosenbrock from that predefined class. The objective has two parameters, x and y declared as data members of the Rosenbrock class. Next, the constructor, Rosenbrock(), is declared, a method with the same name as the class which is called when new Rosenbrock() is executed. Your objective should always define a constructor because some things must be initialized by it. The method that codes \(f()\) is declared and must have the name vfunc(). The reason the name must match is because your code will replace a default function (also known as a virtual method). The v means that it can return a vector of values to sum up to compute the objective.
$$f() = \sum_{j=0}^{M^-} vfunc()[j].$$
Summation is the default aggregation method, but others exist or can be defined, such as sum of logs or product. Notice that the function your code provides takes no arguments. How does it know what parameter values to evaluate the function at? The current values of the parameters are passed through the members of the class described below.
Rosenbrock ::Rosenbrock () { … }
This method initializes the problem (it is the constructor for the class). First, it is necessary for the objective to call the constructor for the parent class, in this case BlackBox(), which takes a string to label output. Next the parameters of the objective are created. In this problem both \(x\) and \(y\) can take on any real-value. FiveO designates this as a Free parameter. Every parameter can be given a label and an initial value can be passed when the parameter is created. The values passed, in this case 0.5 and -0.8, are default starting values that can be used to completely reset the problem. Under some conditions this value will be the actual starting value for the parameter during optimization (x0). But this value can be replaced with a different starting value before optimization begins. This means it is not necessary to change this code to change the starting values, but as a fail safe reasonable starting values are hard-coded. FiveO must know about the parameters, so they are added to the objective with Parameters(). These three commands to define the parameters and add them can be collapsed to one complicated one:
Parameters(x = new Free("x",0.5),y = new Free("y",-.8));
Often you want to start optimizing at a parameter vector that is read from a file or is passed to the algorithm during execution. You do this by using Encode() or Load(). However, if we want to start at hard-coded values 0.5 and -0.8 there is no need to do this. Use of these functions can be seen in other examples.
Rosenbrock ::vfunc() {…}
This codes the calculation of the objective itself. The current values of the parameters are not passed through a vector to be decoded by the user's code (as is typically done with generic optimizers and user-defined objectives). Instead, FiveO always places the current values in the members x and y before calling vfunc(). One member of any Parameter object is v, which is where FiveO puts the values. So x.v. is the value of the parameter object x and y.v is the value of y. The code uses the niqlow defined function CV(), which will retrieve the value of v for objects passed to it. The advantage of CV(y) over y.v is that it will retrieve the value regardless of whether its argument is a Parameter, an Ox double, an Ox integer, or even a function that returns the value. This means that y can be changed from a constant to a Parameter to maximize over to a function of other parameters without changing vfunc().
The code for the problem is included in the program. By separating Rosenbrock from the main program it is possible to use the function in different programs, an example of the software design principle Code once, use often.
main(){…}
Any Ox program must contain a main() procedure which is where execution starts. The objective will be created and stored in the variable obj as will and the algorithm used to optimize it in alg. These are objects of classes, so they are assigned the return value of a new statement. The Algorithms in FiveO require the objective to be optimize be sent as the argument of the creator function. Multiple objectives can be defined, and each can have algorithms associated with it. But once created, each algorithm object is associated with the objective sent to it.
A difference with DDPIn DDP only a single DP problem is defined at a time because it relies heavily on static members to save storage. However, FiveO avoids static variables so multiple objectives and multiple algorithms can all be created without creating memory conflicts that static allocation creates.
Volume of output generated
Many different objects defined in niqlow have a Volume data member which can be set to control the amount of output it produces. These can set separately to control how much output is produced. Here, the volume of both objects is set to QUIET which is one step above the default level of SILENT. See NoiseLevels for the different values.
Iterate
To apply an optimization algorithm to objective, simply call its Iterate() method. Different algorithms take different arguments in Iterate, but often the defaults apply so Iterate() often works as desired. The parameters of algorithms can also be Tuned before iterating.
From the directory that contains the program, it can be run from a command line as:
First both Ox and niqlow print some version and licensing output. Next, FiveO prints out a warning because the program never explicitly set the length of the vector vfunc() returns. Since this is a simple objective it returns a single number so the default size of 1 is appropriate. Many components of niqlow will produce log files that track output, errors, etc. They will write to files with different names based on the algorithm or the name of the objective given by the user. However, all log files from execution of a program will also include a timestamp as part of the file name. The timestamp is the same for all log files started from a a program so the log files that go together can be identified. Because the volume of output was set to QUIET the Nelder-Mead algorithm simply prints a message when it starts and when it ends. In between the objective itself keeps track of whether the computed value \(f(\psi)\) has improved or not. Again, because its volume is also QUIET it only prints out improvements not every function evaluation. Improvements are marked with *.
Log file
The log files for Nelder Mead (given the volume level) has more detail that the direct-to-screen output:
Source: ../examples/output//ExampleA.txt
It shows the final parameter values are as expected. The free parameter for x has gone from 1.0 to 2.0, and for y it has gone from 1.0 (since y to about -1.25. What this means for the real parameters is stated in the final report once Simplex converges: the actual or structural parameters have converged to within 1.0E-5 of the true optimal values x* = y* = 1.0.
Each time FiveO finds an improvement on the objective function it writes out the result to a file (and reports on the screen unless Volume=SILENT). This is designed for slow and difficult optimization problems that one often kills the program before convergence and restarts. Here is the file produced at the end of the process.
The top part of the file is printed using Ox's %v format, which ensures accurately reading the values back in using Load. The best parameter vector starts on line 4 and continues until > is reached, in this case it is a 2×1 vector. The rest of the file is for information. In particular, the structural vector is printed again with labels, but notice with not quite as many digits.
The program could start at the values in the optobj file by adding obj->Load() before the iteration begins. The 0 means Load() will look in the file created by optimization. If you copy the file to Joe.optobj and edit the starting values your program can start there with obj->Load("Joe");.
Sequential use of algorithms
Next we see that BFGS agrees. It does not iterate because the norm of the gradient f(x) is close enough to 0 to declare convergence immediately. It terminates without changing the parameters.
The logit model has likelihood of the form:
$$\ln L(\hat\beta; y, X) = \sum_{i=1}^N \ln { e^{y_ix_i\beta} \over 1 + e^{x_i\beta}}.$$
Here \(y\) is a \(N\times 1\) vector of 0s and 1s and \(X\) is a \(N\times k\) matrix of explanatory variables. The Stata web-available data set called lbw
was modified to include hard-coded dummy variables for race (the Stata example uses its i.race command to generate the indicators on the fly).
The data were then resaved to a Stata data set which can be read by Ox's loadmat() command.
The problem is again a Blackbox objective. A logit model requires a \(y\) vector and a \(X\) matrix which are passed to the creator and stored in the corresponding data members. The parameters to optimize over are a vector of coefficients. The creator uses the Coefficients class to create a block of parameters rather than individual parameters as in the Rosenbrock example. The parameters are stored in beta and it gets the number of parameters from the number of columns of \(X\).
Each observation has its own log-likelihood value. The Logit model could sum up the individual contributions, but then this would preclude use of formulas that rely on the gradient matrix. So instead, vfunc() returns the vector of likelihood values and FiveO will aggregate them. (This is a case of simple LINEAR aggregation, the default. Other aggregators can be chosen using SetAggregation()). However, it is necessary for Logit to set the length of the vector it will return. That is done by the statement NvfuncTerms = rows(y);
The objective vfunc() is quite simple in this case. It retrieves the current parameter vector using CV() to compute \(X\beta\). Then it uses niqlow's built-in FLogit() function to compute the vector of values \(F = e^x / (1+e^x)\). Then it returns the log of \(F\) or \(1-F\) depending on the parallel value of \(y\).
The main program reads the data matrix and sends \(y\) and \(X\) to the Logit() creator. It adds a column of 1s to the end of the data (since Stata reports the constant term as the last parameter. It sends the objective to the BHHH algorithm which relies on the gradient of the vector of log-likelihoods.
We can confirm that the BHHH algorithm produces the same log-likelihood as Stata to three decimal digits. The standard errors it produces are slightly different than Stata, which uses the inverse of the Hessian rather than the OPG approach from BHHH.
Code refers to classes and functions defined in FiveO so import the module.
struct SysExample : System { … }
As with an objective to optimize, the user creates the system of equations they want to solve as class derived from a built-in class, in this case System, which is derived or special case of Objective so it very similar to the example above. Parameters to choose are stored in a member of the class (here x). Your class has to have creator method (same name as the class itself) because it will have to call the creator for the parent class. If you a class Child is derived from Parent, then the expression new Child() will call the routine Child::Child() as part of creating the new object. If Child::Child() was not defined nothing is called. So if Parent::Parent() exists (and in FiveO it does and must be called), then the child class has to have a creator defined and one (and perhaps the only) it does is call Parent(). As a derived class it could use Parent::Parent(), but this is not necessary because it inherits all the methods of the parent class and these are considered part of it so the prefix Parent:: is not needed.
The system of equations is computed and returned by a method that must be called vfunc(). As discussed above, vfunc() is always expected to return a vector even if the class is an objective not a system of equations. The difference with System and with most system solution algorithms is that the vector is not aggregated into a real number.
SysExample::SysExample(N){…}
The creator takes the length of the vector as an argument (in the example N=8, but in the source Ox example it is 99). It calls the parent creator, and in the case of a system it must send the size of the system. In this case it also defines the parameter a vector of Coefficients, which means a vector of Free parameters like regression coefficients. Note that the system need not use vectors. Like the Rosenbrock example above it could give each parameter a different name and store each in different members of the class. Parameters are added to the objective with Parameters(). Encode() can only be called after all parameters have been added. It can take a argument, but with no arguments each parameter is set to its hard coded value. When x was defined it initialized each one to be 0
SysExample::vfunc() {… }
This method computes and returns the system of equations as a function of the current value of parameters added to the model. As discussed above and elsewhere, since parameters are objects their current value is held in its v member (along with other information) and can be retrieved directly or through the more general CV() routine. Since the formula for the system involves three references to the x vector, it is a little more reliable and efficient to get and store the current value rather than access 3 times. The formula is copied from the Ox example and uses the built-in lag0() function.
To solve the system, an object of the class must be created.
Then an appropriate Algorithm object is created and applied to the object. In this example the main() routine is in the same file as the system definition. The system object is created as are two algorithms appropriate for solving a System. In particular, they are derived from NonLinearSystem, which is analogous to GradientBased family of optimizers. The variables alg1 and alg2 are re-used to store these methods. As with optimization, the objective is sent as the only argument to the algorithm that will work on it. Multiple algorithms and systems (including nested ones) can be created, but by sending the system to the algorithm at creation a 1-system for each method mapping is ensured. (However, the same system can be sent to multiple algorithms, so 1-or-more methods for each system.)
ToggleParameterConstraint() is explained elsewhere and is optional, but seems to improve performance in this case.
Iterate()
To apply the algorithm to its objective call NonLinearSystem::Iterate() The example shows that after iterating the parameter values are left at their final values and subsequent iterations start there. This can be changed by calling Encode() and sending a new vector to start out or (in this example) ReInitialize() to the hard-coded values.