TechAppendix

Notation and formulas .

This document provides the mathematical description of the fundamentals of DDP. This is the first document on the technical side. For further help on using and extending your model see also:

Methods for how to solve the model

Data for how to simulate data and read external data in for estimation

This is useful for someone who already knows a great deal about dynamic programming and has some familiarity with Ox or similar languages. If you want a much more basic introduction, start back at the beginner's guide You can also start with GetStarted for a demonstration of coding and return here.

Notation and Conventions
Elements of DP in DDP
Extensions and Specializations
DDP terminology versus other surveys and methods articles
Designing and solving MyModel

Notation and Conventions

Fonts, case and alphabets

Examples

The discount factor:        \(\delta\)
Single action variable:     \(a\)
Vector of actions:          \(\alpha = (a_0,a_1,\dots)\)
Number of values:           \(a\).N
Size of a vector:           \(\alpha\).N
Distinct vectors            \(\alpha\).D
Endogenous state space:     \(\Theta\)
Utility:                    \(U()\)

Lower case Greek (e.g. $\alpha$, $\beta$) denotes

Vectors of discrete variables (i.e. exogenous states $\epsilon$) or scalar parameters, quantities variables with a continuous range (i.e. discount factor $\delta$).

Lower case Roman (a, b, …) denotes

Individual discrete variables (i.e. action a) or variable properties of an object (i.e. current value a.v). A generic element of a vector will usually use the Roman letter corresponding to the vector's Greek name and without subscript. A subscript is used when ordering is important.

Upper case Greek (e.g. $A$, $\Theta$) denotes

Parameter vectors orspaces (sets of discrete vectors).

Upper case Roman (A, B, C, …) denotes

Functions, often including empty brackets and arguments suppressed (i.e. U()). or fixed properties of an object (i.e. a.N)

Objects and the property operator .

In mathematical notation, $a_N$ might denote the number of values $a$ takes on. This works well when $a$ has only one property. In DDP, a variable like $a$ has several properties so using subscripts and superscripts to indicate them can become clumsy. And, properties can have properties. Instead of using $a<_N$, the notation a.N is the property N associated with the object a. The binary . operator is how properties (members) are accessed in Ox.

o.p retrieves from the object o the aspect or property p.

The Quantity class represents Discrete variables and continuous Parameters. The usual notions of variables and parameters in mathematical models are derived from the base type Quantity. Quantities are either discrete (actions and states) or continuous (parameters). Quantity objects are typically added to one or more lists using routines built into Quantity objects.

Ranges: 0 … m.

Although it is natural to start counting from 1, starting from 0 has some advantages. It is common in C-like languages such as Ox. Typically in DDP, 0 is the lowest and first value a discrete variable takes on. Starting at 0 means a variable n that takes on N values has a range $n=0,1,\dots,N-1$. This is the default range for discrete variables.

Special Functions and Operators

Unary decrement operator ‾ (postfix)

Counting N values starting from 0 means the last possible value is N-1, a bit of notational clutter that can be confusing. To avoid the clutter define the unary decrement operator ‾ as J‾ ≡ J - 1. For example, 5‾ = 4. So a variable n with n.N values would have the range n = 0, … ,(n.N)‾.

′ : next value(s) of a state variable (postfix)

If s is a state variable then values it does or can take on next period are denoted s′.

The indicator function I{}.

$$I{x} = \cases{ 1 & x is TRUE\cr 0 & otherwise.\cr}$$

Cartesian product of discrete sets, $\times$ is the matrix of all possible vectors of the sets.

Let a₀ and a₁ be two discrete variables that each take on values 0 and 1. Then

$a_0 \in \{ 0, 1 \}, a_1 \in \{ 0, 1 \}$ then $a_0 \times a_1$ equals

               0         0
               1         0
               0         1
               1         1

OOP Lingo

OOP has jargon. This jargon is used here in an effort to be accurate in describing the code but it may be obscure to people unfamiliar with OOP. In Ox, a class and a struct are both a class as usually defined.

What's the difference?

The difference between struct and class is simply whether elements of an object are by default directly accessible from the outside (i.e. public) or not (private): yes in a struct, no in a class. DDP is designed for convenience not reliability, so everything class is declared struct, but the term class is used in this documentation.

A class is a bundle of data and functions to operate on the data. The data are called members of the class and the functions are called methods, although Ox documentation also refers to these as data members and function members, respectively. Multiple copies of a class can be created while a program runs. Each copy is called an object or instance of the class. The key is that the methods work with the data of the object without needing to pass the data to it as with non-OOP languages or constructs.

Members and methods of class are either static or automatic. This distinction is extremely important in the design of DDP. Static members/methods are shared by all objects of a class, whereas automatic members/methods are specific to the instance. DDP conserves memory by storing as much information in static variables as possible. The word automatic does not ever appear, it is implicit. If the tag static does not appear in the declaration of the item then it is automatic.

MyModel, DPPparent and other placeholder names

Geek talk:

MyModel and MyCode are metasyntatic variable like foo.

A user builds a DDP model by adding components to it: states and actions and the functions related to them. DDP cannot know how many items will be added of each type. How can the model be ready to store whatever the user chooses? The answer: the user of DDP constructs a model as a class derived from one of the built-in models, called a DDP for Derived Dynamic Program. The user's class inherits the built-in properties of its base model. So DDP can solve the model and produce output for it even though it does not know the details of the user's model until the program starts executing.

DDP Models are derived from the Bellman class, which in turn is derived from the base DP class. If the user wants to call his/her model MyModel, they would have something like this

class MyModel : DPparent {
    ⋮
    }

class Make : ExtremeValue {
    ⋮
    }

MyModel

Make

DPparent

ExtremeValue

Which Bellman-derived class that MyModel is based on is called DPparent in these notes. MyModel:: is prefixed to items that the user provides or customizes. DPparent is prefixed to items related to the parent. Other predefined or default items either have no prefix or are prefixed by DP::. The convention of using MyModel avoids having to write repeatedly the user's version of … DP::x Instead, MyModel::x suffices.

MyCode

A user must define elements of MyModel in Ox, and they must write an Ox program that executes some tasks in the proper order. For example, the code must always call the Initialize() method for DPparent before adding things to the model. So the Ox code that executes these tasks is collectively called MyCode in these notes. Another way to think of it: MyModel is a translation of the pen-and-paper aspects of your model and MyCode are the instructions to implement the model, solve it and use it.

Some Properties of DDP Objects

The interpretation of a property can depend on the kind of object on the left side of ".". Here are some of the key properties of objects in DDP. Note that the association of this variable names to a property is a convention in DDP. It is not inherent in Ox, and there may be exceptions even within DDP.

N : cardinality.

A discrete object (such as states and actions) has a cardinality/range. d.N is the number of distinct values d takes on, and generically these are the range 0,1, … ,(d.N)‾. A vector, such as $\alpha$, has a size N, which denotes the length of the vector.

D : dimensionality

A vector x of discrete variables has a length x.N. But it creates a space of possible values (the Cartesian product) equal to the product of the individual variable cardinalities. So x.D is the size of the Cartesian space of a vector x. $$x.D \equiv {\prod}_{i= 0}^{x.N-1} x_i.N.$$ A space in DDP, say $\Theta$, is usually a set of vectors of the space of possible vectors. The number of points in a space is then $\Theta$.D.

Rows and columns of matrices

Since matrices are typically representing a vector space, A.D is the number of rows and A.N is the number of columns.

i : position.

Many objects appear in a vector or a list. The objects position in the list is i. So a_i is a name for the ith action variable in α. We can write a_i.i = i. This redundancy turns out to be very important in some cases.

v : current value.

In math, a usually means the value of the variable a. This notation is inadequate when variables have multiple properties and when the notation is meant to reflect some elements of the computer program. Instead, the current value of a variable or vector is the property v. So if a is a discrete variable a.v is the value of a, which can only be one of the values 0 … (a.N)‾.

An important function in niqlow is CV() which will return the current value of objects sent to it. So typically MyCode does not need to reference .v directly. This is explained in detail below.

.actual : the actual values

MyModel may need discrete values to correspond to another set of values (possibly not even integer values). Which values are mapped to may depend on parameters that are changing between solutions of the model.

If MyModel creates an action or state variable x from a derived the class then it can also provide an Update() routine to reset and store the vector of actual.

The default is that x.actual = 0... (x.N)‾, the range of x.v.

Accessing Values of Quantity Objects

MyCode can get the current value of an object simply using x.v (and the actual value as x.actual[v], but this is not recommended. MyCode should never modify these values because DDP ensures they have the correct values at all times.

One reason to avoid direct reference to .v is that MyCode will change as it develops. For example, early on some quantity x may not be an action or state variable, but simply a fixed number. So MyModel can access its current value as simply x. However, as the model takes shape x may be changed to a variable. But now x is a complicated object not a number. The user would have to go through MyCode and x to x.v. It is also possible to want to change a number into a function, x() that computes and returns a value.

DDP provides functions to access current and actual values that will work even when the container changes from a simple variable to a Quantity object or function.

CV()

CV(x) is a routine in niqlow that you can send almost anything to and it will return the value of the object. It examines the argument x, and if it is a Quantity object with an element v then CV(x) will return it. That is CV(x) = x.v.

What makes it very useful is that it will also return the value of other things that are not Quantity objects. If you send a real number (called a double in Ox) as x then CV(x) simply returns the value. If you send a function to CV() then it will call the function and return its value. Thus, you do not need to change your code as some concept changes during programming if you use CV(x).

AV()

Discrete quantities like state variables always take on values from 0 to some upper bound N-1. However, in the model each of those discrete values may map into different values in the model. The property actual contains these model-relevant values. AV(x) acts like CV() but it returns x.actual[x.v].

By default AV(x) ≡ CV(x) unless different actual values. This is handled by calling x->myAV() if it exists. Discrete quantities have different myAV() functions, but the default is that myAV() = actual[v]. That is, the current value of the object is used as an index into a vector of actual values.

Actual values are reset when the object's Update() is called. Variables are updated before value iteration methods begin so the actual values can depend on estimated parameters.

In this way a->Update() is only called once for each variable on each model solution to reset .actual. If .actual were not a vector x->Update() would have to be called every time x.v changed.

Elements of DP in DDP

First, the simplest and most general notation is introduced to describe a DDP. However, code that simply matched the general form of a DP model would quickly overwhelm memory or computing capacity. DDP saves memory and calculation by letting MyModel categorize elements and specialize the environment.

Definition of a DP Model

Primitives: basic elements of a complete DP model.

$\alpha \in A$: The (finite) set of possible actions
$\theta \in \Theta$: The (discrete) state space
$P(\theta^{\,\prime}\,|\,\alpha,\theta)$: Conditional transition to the next state, including any notion of a time dimension.
$U(\alpha,\theta)$: Utility/return/payoff of an action conditional on the state
$\zeta_\alpha$: shock added to the value of a choice in order to smooth choice probabilities.
$E\left[\sum_{t=0,\dots}\delta^t\,U(\alpha_t,\theta_t)\right]$: additively separable objective with foresight; $\delta$ is the discount factor.

Aspects of the DP solution.

$v(\alpha,\theta):\quad$ Value of current choice given future optimal choices.
$V(\theta):\quad$ Value of arriving at state \theta, accounting for current optimal choice
$EV(\theta^{\,\prime}|\alpha,\theta):\quad$ Expected value entering next period (after integrating out optimal values across IID random terms)
$P^\star(\alpha|\theta):\quad$ Conditional choice probabilities (after integrating out IID random terms).

Bellman's Equation: Solution of a DP

Bellman's equation is really a combination of four equations that jointly relate the aspects above to the primitive elements of the model.

$$\eqalign{ v(\alpha,\theta)\quad &\equiv \quad U(\alpha,\theta) + \delta EV\left(\theta^{\,\prime}|\alpha,\theta\right),\quad \forall \alpha\in A, \theta \in \Theta\cr EV(\theta^{\,\prime}|\alpha,\theta)\quad &= \quad \sum_{\theta^{\,\prime}\in\Theta} P(\theta^{\,\prime};\alpha,\theta) V(\theta^{\,\prime})\cr V\left(\theta\right)\quad &\equiv\quad \max_{\alpha\in A}\quad v(\alpha,\theta)\cr P^\star\left(\alpha,\theta\right)\quad &\equiv\quad Prob\left[\ \alpha \in \arg\max_{\alpha\in A}\quad v(\alpha,\theta )\ \right]\cr}$$

Encoding Abstract Elements

This section shows how each of the elements of the general DP framework is represented in DDP. The user's code (known here as MyCode) will build the model up dynamically as the code executes. Then when all the elements of the model has been defined the code will call DPparent::CreateSpaces(), which will construct the action set and state space. After this, MyCode can solve the model and use using tools described elsewhere.

Framework
The shell for the model.

class MyModel : DPparent {
     // declare static members to hold action and state variables objects
     // declare required and optional methods
    static Initialize();
    }
⋮
MyModel::Initialize() {
    DPparent::Initialize(new MyModel(),…);
       // define actions and states (create objects)
       // add them to the model
    DPparent::CreateSpaces(…);
    ⋮
    }

Notes

Elements of the model are added between the call to DPparent::Initialize() and DPparent::CreateSpaces().

There is no requirement that MyModel provide an Initialize() function. However, it is convenient to do this so that all the model creation steps occur together within a (static) method, and that method has direct access to elements of MyModel and, through inheritance, all the methods and data members within DDP.

[Subtle/Non-Intuitive Alert!] Every DP model sends a new copy of itself to the parent Initialize() method. What this means and why it is done is not easy to explain at this point. See below.

The … inside () means that some other required arguments need to be sent or optional arguments can be sent, depending on the parent of MyModel.

Action Variables

MyCode builds the action $\alpha$ by adding action variables to it using Actions().

At a minimum, an ActionVariable is defined by its Label and the number of distinct values it takes on, N. DDP tracks values of a as 0 … N‾.

Example
Define a binary choice variable d and add it to $\alpha$.

class MyModel : DPparent {
    ⋮
    static decl d;                         // NEW
    ⋮
    static Initialize();
    }
⋮
MyModel::Initialize() {
    DPparent::Initialize(new MyModel(),…);
    ⋮
    d = new ActionVariable("choice",2);   // NEW
    Actions(a);                           // NEW
    ⋮
    CreateSpaces();
    }

See Full Action Variable and Action Vector Documentation.

State Variables and Blocks

MyCode builds the state space by adding adding state variables and state blocks to MyModel.

As with $\alpha$, the state of the DP model $\theta$ is built up by adding state variables to it. In the basic notation above, $\theta$ is simply a point in a set, but in DDP it will be a vector of individual state variables. Unlike action variables, state variables evolve and how they evolve affects what needs to be stored and computed for them. The transition $P(\theta^\prime; \alpha,\theta)$ emerges from the individual transitions of the state variable added to the state.

In DDP, state variables are classified as either autonomous or coevolving depending on how they enter the state transition Ρ().

Autonomous Variables

If s is autonomous, then its transition is independent of all other transitions and the transitions of all other variables is independent of s. This means the s transition enters the overall Ρ() independently. The transition for s can still depend on the current action and current state. The transition is specified by making the state variable an instance (object) of one of the built-in autonomous state variables adding it to MyModel.

Coevolving Variables and StateBlocks

If the transition of a variable depends on the transition of one or more other states then it is coevolving and must be placed in a StateBlock. If a variable is coevolving then its StateBlock is responsible for determining the transitions. A block is itself independent (autonomous) of all other autonomous state variables and blocks.

See State Variable and Block Documentation.

Example
Define a state variable m that takes on the value of action variable d chosen last period.

class MyModel : DPparent {
    ⋮
    static decl d;
    static decl m;                                //NEW
    ⋮
    static Initialize();
    }
⋮
MyModel::Initialize() {
    DPparent::Initialize(new MyModel(),…);
    ⋮
    d = new ActionVariable("choice",2);
    Actions(a);
    m = new LaggedAction("prevd",d);             //NEW
    EndogenousStates(m);                         //NEW
    ⋮
    CreateSpaces();
    }

Transitions and the Clock

Ρ() is generated automatically by the state variables and blocks added to the state vectors.
MyModel always includes a time-keeping state block,

The state space $\Theta$ is built up by DDP from the state variables added to the model.

Each state variables or block q is an object of a class that defines all aspects of behaviour including its interaction with actions and other state variables as well its transition from the current state $q$ to the state next period, $q^{\prime}$. Therefore, the overall transition $P(\theta^{ \prime};\alpha,\theta)$ is constructed by DDP rather than specify independently of the state and action spaces.

Today and Tomorrow

The abstract DDP model defined above has an implicit concept of today, the current state $\theta$, and of tomorrow, the next state $\theta^{ \prime}$.

But since, at its solution, Bellman's equation holds at each point in $\Theta$ there is no need to further specify timing within the model. However, if a monotonic time variable is an element of the state vector, then Bellman's equation can be solve sequentially backwards in time, reducing temporary storage and computational requirements.

Because timing is key to the efficient solution algorithm DDP always has the concept of a model clock. The literatures includes many models that have generalized notions of time that still exhibit monotonicity. To account for these notions the clock is always a StateBlock with at least two state variables in it.

Example
Set the clock to a finite horizon of 40 periods.

class MyModel : DPparent {
    ⋮
    static decl d;
    static decl m;
    ⋮
    static Initialize();
    }
⋮
MyModel::Initialize() {
    DPparent::Initialize(new MyModel(),…);
    SetClock(NormalAging,40);                //NEW
    ⋮
    d = new ActionVariable("choice",2);
    Actions(a);
    m = new LaggedAction("prevd",d);
    EndogenousStates(m);
    ⋮
    CreateSpaces();
    }

Utility

MyModel::Utility() should return utility as a vector for the feasible matrix at the current state.

MyModel::Utility()

The one period utility/return/payoff, $U(\alpha;\epsilon,\eta,\theta)$. It must be called Utility(), because it replaces a virtual method, DP::U. It returns the utility as a vector: one element for each feasible action θ.A. You might expect MyModel::Utility() would require arguments to pass the value of state variables. However, using the object-oriented approach to representing the model means that the values are available because DDP will set the value of members of MyModel before calling U().

Example
Define the Utility to equal an indicator for whether the current action equals last period's action.

class MyModel : DPparent {
    ⋮
    static decl d;
    static decl m;
    Utility();                          //NEW
    ⋮
    static Initialize();
    }
⋮
MyModel Utility(); {                   //NEW
    return CV(d) .== CV(m);            //NEW
    }                                  //NEW
⋮
MyModel::Initialize() {
    DPparent::Initialize(new MyModel(),…);
    SetClock(NormalAging,40);
    ⋮
    d = new ActionVariable("choice",2);
    Actions(a);
    m = new LaggedAction("prevd",d);
    EndogenousStates(m);
    ⋮
    CreateSpaces();
    }

Foresight

MyModel should set the discount factor using SetDelta().

$\delta\in [0,1)$: the discount factor.

The default value is DP::delta=0.95. MyModel::δ is either a fixed real value or a Parameter, which allows it to depend on outside variables and/or to be estimated within a nested solution algorithm. SetDelta() can be used to set the value, passing either a real number or a Parameter. Unlike some other elements of the model, the discount factor can be set or changed after CreateSpaces() has been called.

Example
Set δ, the discount factor, to 0.99.

class MyModel : DPparent {
    ⋮
    static decl d;
    static decl m;
    Utility();
    ⋮
    static Initialize();
    }
MyModel Utility(); {
    return CV(d) .== CV(m);
    }
⋮
MyModel::Initialize() {
    DPparent::Initialize(new MyModel(),…);
    SetClock(NormalAging,40);
    ⋮
    d = new ActionVariable("choice",2);
    Actions(a);
    m = new LaggedAction("prevd",d);
    EndogenousStates(m);
    ⋮
    CreateSpaces();
    SetDelta(0.99);                            // NEW
    ⋮
    }

Extensions and Specializations

The DDP code so far captures the basic elements of the general DP framework. Solving the model to derive Bellman's equation is discussed separately. This section adds more structure to the generic framework and shows how to build this structure into MyModel.

Five State VectorS

Sort state variables by their role in the transition Ρ().

The generic state space $\theta\in \Theta$ generalizes to $(\zeta,\epsilon,\eta,\theta,\gamma) \in (Z,E,H,\Theta,\Gamma)$

Following the literature special or restricted state variables are placed in different state vectors. The single vector $\theta$ is replaced by multiple vectors holding specially behaved state variables: $\eta$, $\epsilon$, $\theta$, or $\gamma$. The most general kinds of state variables are placed in a vector still denoted $\theta$.

Segregation of state variables into different vectors can reduce memory and computing since only the information required for restricted state variables are stored.. These distinctions matter for how DP solves MyModel, but from the point of view of MyModel a state variable is just a state variable regardless of which category it is placed. A generic state variable that is not associated with a particular vector is denoted s.

Placeholder state variables

Any of the state vectors may be empty in MyModel except $\theta$ which always has a Clock.

If a vector is empty in MyModel then DDP places a special Fixed state variable that takes on only the value 0. This has no effect on the size of the state space but it greatly simplifies the internal coding of algorithms.

These placeholders simplify the internal coding of the problem greatly and do nothing to expand the state space. They do appear in output so they take up some space on the screen.

The 5 State Vectors

$\theta$: Endogenous state vector

A generic element of $\theta$ is denoted q. Anything that MyCode places in the other state vectors could be in the $\theta$ (but not vice versa). State variables are added to $\theta$ by sending them to the function EndogenousStates(). Like any DDP model $\theta$ is a semi-Markov process in which the transition to $\theta^{ \prime}$ depends potentially on all current state variables and the action $\alpha$.

The other state vectors separate out variables which evolve in a simpler or more restrictive way. $\theta$ always contains a single StateBlock derived from Clock. See Clock Block below.

$\eta$: Semi-Exogenous state vector

A generic element of η is denoted h. η is a place for restricted endogenous variables whose transition probabilities are be independent of all other variables. Any element of η is an IID process. It is semi-exogenous because the current value can influence the transition of the exogenous state variables, $\theta$. Semi-exogenous states are added to η using SemiExogenousStates().

$\epsilon$: Exogenous state vector

A generic element of $\epsilon$ is denoted e. $\epsilon$ can include only (fully) exogenous state variables and is thus more specialized than a semi-exogenous variable. These variables are not only IID but they cannot direclty influence the transition of any other state of the system. Elements of $\epsilon$ satisfy Rust's Conditional Independence property. States are added to $\epsilon$ using ExogenousStates().

$\gamma$: Grouping (random and fixed effect) state vector

A generic element of $\gamma$ is denoted $g$ States are added to $\gamma$ using GroupVariables(). A grouping variable is equivalent to either a random effect or a fixed effect in a panel model. Either way, it does not vary within the life of an agent following the DP, but from our point of view it is random for an agent. Because grouping variables do not vary within a solution it is wasteful to create space for each of their values and the other states in $\theta$. Instead, DDP resolves the model for each value of $\gamma$. It stores differing choice probabilities across random effects for a given fixed effect value, so that integration across random effects can be carried out.

Warning: if random effect elements of $\gamma$ affect the transition $P()$. you have to change the UpdateTime in your model Because $\gamma$ is fixed during a solution, it is often not listed as an argument of U() and endogenous outcomes such as $V(\theta)$. However, MyModel can treat elements of $\gamma$ like other states.

$\zeta$: Continuous value shock vector

Continuous states are placed in $\zeta$ and can only affect U() not $P()$.

The most specialized random elements are those that are IID, do not influence the transition of other state variables and enter utility as an additively separable shock. These are often random variables with infinite support that smooth choice probabilities. Elements of $\zeta$ are not really stored. Rather the distribution of $\zeta$ affects the specification of Bellman's equation, specifically the expression for $EV(\theta^{ \prime})$. If any such shocks are in the model, then a solution method must be available to deal with them explained below in Solution Method. Any continuous shocks $\zeta$ are not included in $U()$ as the user codes it. Instead, the distribution of continuous shocks is accounted for by the algorithm to compute $EV(\theta^{ \prime})$.

Feasible Actions at a state: $\theta$.A

MyModel can account for limits on choice conditional on the endogenous state.

The generic action space $\alpha \in A$ generalizes to $\alpha \in A(\theta)$.

The matrix of possible actions, Α, was defined above as the Cartesian product of the ranges of all action variables. However, in many cases MyModel may rule out certain actions as not logically possible at a particular state. Or some actions are ruled infeasible for convenience to avoid calculations that are relatively unimportant to the overall goal of the model.

Feasibility is a property MyModel imposes on $\alpha$. The model rules out some actions given the interpretation of $\alpha$. In dynamic programming, the set of feasible actions can depend on the current state, $\theta$. In typical math notation it would be natural to write this as $A(\theta)$, where $A()$ is now a matrix-valued function of the state. Instead, write feasibility as a property of the state

The feasible actions at $\theta$ is a matrix property: $\forall \theta \in \Theta$, $\theta.A \subseteq A$. DDP does not allow exogenous states to affect the choice set. So MyModel must assign a variable that affects feasible actions to $\theta$ even if its transition would otherwise qualify for exogenous or semi-exogenous status.

A different way to handle infeasible choices is to have MyModel::U() return numeric -∞ as the utility for any infeasible $\alpha$. This option is always open for use in MyModel, but it does not the size of the static optimization problem and is not as close to the standard notation.

By default all possible actions are feasible at all $\theta$ because the built-in FeasibleActions() specifies this. If MyModel does not say otherwise, $\theta.A \equiv A$ for all endogenous states. MyModel can restrict choices by providing a replacement for the virtual method Bellman::FeasibleActions(). MyModel::FeasibleActions returns a column vector which indicates that $\alpha$ is feasible or not:

FeasibleActions() returns a vector of length A.D containing I{A.i∈θ.A}, i = 0 … (A.D)‾.

MyModel

A

Typically there is a small number of different feasible sets relative to the size of the state space. In this case, storing a matrix at each $\theta$ is wasteful. So DDP stores a list (OxArray) of different feasible sets. Rather than storing $\theta$.A it only stores an index $\theta$.j into a list of feasible sets. In DDP, the list of feasible matrices is simply A. And the index $\theta$.j into the list at a state is Aind. MyModel accesses the current feasible matrix as Alpha::A[Aind]. The first matrix, Alpha::A[0] is always the possible matrix A. If MyModel does not specify feasible actions, then Aind = 0.

Note: the elements of the A list are the actual value of actions and are updated at the start of each value solve. If an action variable does not have its own Update() routine defined then the actual values are simply the default range 0 … (a.N)‾.

In DDP the key functions, U() and Ρ() act on a single point in the state space at a time. So the current value of a state variable is placed in the .v property of the Ox variable representing it. On the other hand, both U() and Ρ() are 'vectorized' in actions: they must operate on the whole feasible matrix at once.

Action variables have the .v property, but it is not used for them. Their current values are in a column of the $\theta$.A matrix, which is A[Aind] in the code. The previously described functions $CV()$ and $AV()$ work with action variables as well, returning the column of the current/actual action matrix at $\theta$.

For example

Consider a model that has two choices: work hours and whether to volunteer or not. Then at some state $\theta$.A may look like this, along with the return value of a(work).

     A[Aind]     |
work      vol    |  CV(work)
 -------------------------
 0         0      |    0
 1         0      |    1
 2         0      |    2
 0         1      |    0
 1         1      |    1
 2         1      |    2

Terminal States and Terminal Values of State Variables

MyModel can make values of a state variable terminal by calling MakeTerminal().

Some dynamic programs end if and when certain states are encountered. Such states are called terminal states. There are three features of terminal state:

There is no more choice from that period on. Often a model is set up so that it terminates the program when an action is taken. That is coded in DDP as a transition to a next state that is terminal.
Transitions from terminal states are undefined (or ignored).
The value of a terminal state is exogenous to the model and not solved as part of Bellman's equation. In DDP the terminal value is returned as the utility of the state (not a function of the non-existent choice).

If a value of a state variable makes a state terminal it is called a terminal value. DDP considers termination a property of value(s) of an endogenous state variable which the whole state $\theta$ inherits.

   q.T is a subset of the possible values of q that terminate decision making.

MyModel

q->MakeTerminal()

$\theta$

The set of terminal states is defined as $\overline{\Theta}$. A state is terminal if any of the endogenous state variables currently equal a terminal value. $$\theta.T\quad =\quad I\{ \hbox{for some }k, \theta.v_k \in q_k.T \}.$$ The convention in DDP is that at a terminal state there is no choice, and MyModel must provide a value for the state via U(). Because of this convention, MyModel::FeasibleActions() is not called at a terminal state. Instead, for $\theta \in \overline{\Theta}$, $\theta$.A is automatically equal to the first row of $A$.

Let $\overline{V}(\theta)$, for $\theta \in \overline{\Theta}$ be the exogenous value of arriving at a terminal state $\theta$. MyModel::Utility() returns this value. DDP sets it as $V(\theta)$ directly.

Trimming/Prunning The State Space $\Theta$ of Unreachable States

MyModel can trim the state space by providing a MyModel::Reachable() routine. It returns TRUE for states that can be reached from initial conditions and FALSE otherwise.

Following the notion of possible versus feasible actions above, the Cartesian product of all possible values of the endogenous state variables is defined as the possible state space: $$ \Omega \quad\equiv\quad \prod_{q_k \in\theta} \{ 0 \dots q_{_{k.^{N-1}}} \}$$ The current value of a state is always equal to some row in $\Omega: \theta.v \in \Omega$.

Just because a state is possible ($\theta\in\Omega$) does not mean the DP can ever get there. Of course, if the DP was solved and then started at exactly $\theta$ it would be reached. But finite horizon dynamic programs must have initial conditions specified. And those initial conditions can mean that some states in $\Omega$ will never be reached. And if they can't be reached then they do not need to be stored and included in the solutions to Bellman's Equation.

So in some models, especially those with a finite horizon, Ω contains many endogenous states that cannot be reached from possible initial states of the user's situation.

The term Reachable

The term reachable is used for two reasons.

First, in DDP the term feasible applies to action sets not states.

And, second, reachable emphasizes the fact that it depends on what initial states are assumed not logical inconsistency.)

As with feasibility of actions, reachability of a state is not a mechanical property. Rather it depends on the model and how it will be used. Since, in DDP, an endogenous state $\theta$ is not just a vector of numbers but rather an object with many properties attached it, it is important for efficiency that DDP only create and process objects for reachable states.

The property $\theta$.R equals 1 if MyModel specifies that $\theta$ is reachable. Otherwise $\theta$.R = 0. The state space $\Theta$ is the set of reachable states within the set of all possible states. It emerges from the property $\theta$.R of each logically possible state: $$\Theta\quad \equiv\quad \bigl\{ \theta \in \Omega\ :\ \theta.R = 1 \bigr\}$$

Storage Details

Note that R is a conceptual property only. During computation objects are only created for states with &theta.R=1. So the actual test is whether a possible state contains an object or just the number 0 as a placeholder. Storing a 0 for possible but unreachable states still requires allocation of a single oxvalue, but it avoids storage of vectors and matrices associated with each reachable state (such as the utility vector, the optimal choice probability matrix, etc.).

As shown above, MyModel must call DPparent::CreateSpaces(), which sets up the list of feasible action matrices and creates the state space $\Theta$. It must traverse (loop over) the possible state space Ω at least once It is during this traversing that it is determined whether the point $\theta$ is reachable or not. CreateSpaces() uses two methods to determine reachability: inherent reachability of included state variables and the user supplied routine that asserts reachability or not.

Auto Trimming of $\Theta$ in Finite Horizon Models

Some state variables generate inherently unreachable states. Above it was emphasized that reachability depends on the whole model, but including some kinds of state variables in your model will generate unreachable states in non-stationary environments (such as a finite decision horizon).

As an example, a Counter state variable counts how many times an action or state has occurred in the past. If the counter starts at 0 in a finite horizon model then the only reachable states are ones in which this state variable's current value is less than or equal to the value of the clock, t.

This is true regardless on other state variables added to the model, which actions are feasible, etc. It depends solely on the clock type, initial conditions and the presence of this variable in the model.

So some trimming of the state space can be done automatically based on state variables in the endogenous vector and the model's Clock block. By design, state variable objects do not know directly about the model's clock, because all the discrete variable classes are defined and used by the DP class.

So checking for reachable requires sending the clock to the state variable. The StateVariable class includes a virtual method named IsReachable() which takes a single argument that will be the model's clock block. The default function returns TRUE. That is, by default, a state variable says no point in the state space is unreachable. However, some kinds of state variables that do generate unreachable states (such as aCounter) supply a replacement copy of IsReachable().

At each possible state in the space Ω CreateSpaces will call IsReachable() for each state variable in the endogenous state vector $\theta$. If any of them return FALSE that point is marked as unreachable. (This can be turned off when creating the state variable using the optional Prune argument.)

If no state variables claim the current state is inherently unreachable then the user supplied method is called which can mark states unreachable for reasons not inherent to the clock and state variables themselves.

Providing a user-defined Reachable() method.

MyCode must first call DPparent::Initialize(), then add variables to the model, then call CreateSpaces. An object of MyModel is sent to DP::Initialize() which will clone it for each reachable point in the state space. As with Bellman::FeasibleActions, which must have that name because it is virtual function, this function must have the name Reachable().

The base Bellman class has a method Reachable() which simply returns TRUE. So if MyModel does not provide its own version of Reachable all possible states are asserted as reachable. However, remember that some state variables may have inherently unreachable states in finite horizon models, and these conditions will be checked regardless of whether MyModel provides its own Reachable().)

If MyModel provides a replacement for the virtual Reachable it must return TRUE or FALSE depending on whether the current values of state variables are a reachable state. Inside CreateSpaces() and during the loop over the possible state space $\Omega$, and if no state variables assert inherent unreachability of their current value, then MyModel::Reachable() is called.

MyModel::Reachable() indicates $\theta$ is reachable by returning TRUE. Otherwise it should return FALSE to indicate something about the current values of all endogenous state variables makes this point unreachable. So Reachable() returns 1 if $\theta.R = 1$ and 0 if $\theta.R = 0$.

Avoid non-static members of MyModel.

Recall that MyModel is a class derived from some DDP, denoted DDPparent. A DDP is designed to represent both the overall model and an endogenous state $\theta$. DDP creates a copy (an object) of MyModel for each reachable $\theta$. It places them on a list named Theta (a static global variable).

To conserve memory, only a limited number of variables (aka data members) are specific to each object for different $\theta$'s. These are what Ox calls automatic variables. Most data members for MyModel are static members. They are shared by all objects of type MyModel. These are properties of the overall model.

To conserve space, the Ox variables (members) in MyModel that hold actions and states should by declared static (see code above). Otherwise, if variables are automatic new storage for them is created at each point in $\theta$ even though DDP processes one $\theta$ at a time. By storing elements as static and then updating their current value (.v property) storage for large state spaces is reduced dramatically.

Multiple Dynamic Programs: The Group Space $\Gamma$

MyModel can require several solutions to a DP model that differ only by shifts in $U()$ or $P()$.

Group variables are like random or fixed effects in econometrics. They are fixed and non-random from an agent's point of view, but from our point of view they vary across agents.

Group variables are not involved in the creation of $\Theta$, which is reused for the solution of the model for each $\gamma$. Instead, the group space $\Gamma$ is created from the Cartesian product of all possible values of the group variables. $$\Gamma\quad\equiv\quad\prod_{k=0\dots\gamma.N-1}\ \bigl\{\,0\dots\, \gamma.N-1\,\bigr\}$$

Example

There are two group variables, γ = (g,d), where g is gender, so g.N‾=1 and d is degree status so d.N‾ =2, (no high school degree, high school, some college). Then Γ = {0,1}×{0,1,2} =

$$P_g(\gamma)\quad=\quad \prod_{k=0\dots \gamma.N-1}\ p(g_k.v)$$

Choice Probabilities: $P*$

DDP solves the DP model for each group vector $\gamma \in \Gamma$. At each $\theta \in \Theta$, $P*( \alpha\ ;\ \epsilon, \eta, \theta, \gamma )$ is stored as a matrix in $\alpha$ and $\epsilon \times \eta$ and a list (OxArray) in $\gamma$.

DDP terminology versus other surveys and methods articles

italic

A key reason the notation differs here is because it is used to describe a framework for designing a DDP and solving it efficiently. Most other notation is used to describe a specific model or to describe models generally without reference to restrictions that can be used for efficiency.

Aguirregabiria & Mira (JoE 2010)

Actions:

α → (a) [Here, vectorized actions do not retain dimensions of choice]

A.D → J

a → a_it

Clock:

t ∈ θ → t as a subscript. [Here, subscripts are used for other properties and t is in the state already]

Discrete States:

θ → x_it. [Here, allow Greek vectors to be distinguished from Roman variables]

ε and η → empty vectors [Here, discrete exogenous state variables save space and computation]

Transitions:

Ρ(θ′|α,θ) → f_x(x_i,t+1|a_it,x_it) [Here, distinguish upper case functions from lower case vectors and variables]

Continuous States:

ζ = (ε_it), always with size A.D [Here, ζ may have lower dimension that the action space.]

utility is always additively separable (their assumption AS).

Bellman:

EV(θ) → V(x_it) [mnemonic for Expected Value]

Ρ*(α|θ) → P(a|x,θ) [Distinguished from primitive Ρ() even when arguments are suppressed.]

Miscellaneous

δ → β [Here, mnemonic for discount factor.]

Keane, Todd & Wolpin (Handbook of Labor Economics 2011)

Actions;

α = (a₀… a_α.N‾) → (d^00…1 d^10…0 … d^11…1)

α.D → the length of the superscript;

d^….N = 2, and ∑ d^… = 1.

Discrete States:

θ → Ω‾_it

ε and η → empty vectors

Continuous States:

ζ → S(Ω‾_it) [loses self-standing vector status]

Bellman:

Choice Probabilities: Ρ*(α|θ) → Pr( d^…=1 |Ω‾_it)

v(α|η,ε,θ,γ) → V^…

Designing and writing MyModel

Overview

MyModel

DDP

Determine the timing in your model (stationary, finite horizon, etc.).
List the action variables in $\alpha = (a_0,a_1,\dots)$.: Many papers vectorize the action variables into one long choice. But DDP makes it easy to separate different dimensions of choice so that each variable is meaningful.
List and classify state variables and state blocks.: For each state variable/block, either find a predefined class that it matches (up to the value of parameters to the creator routine for the class) or derive a new transition for it. If two or more variables are conditionally correlated (coevolving), place them in a state block and derive a transition. Specify the dependency of the transition on the current values of other state variables and actions.
Based on these dependencies assign each state variable to $\epsilon$, $\eta$ and $\theta$.
Terminal values: if values of endogenous states end decision-making: Express as a value or set of values of variables in θ that terminate decisions. Use the method MakeTerminal() to mark values as terminating. This implicitly defines $\overline{\Theta}$
Reachability if not every endogenous state can be reached: Express θ.R as a logical/boolean value depending on the values of variables in θ. This implicitly defines $\Theta$
Fixed group variables in γ: For each group variable, find a predefined random effect variable that matches it or derive a new type of group variable.
Constraints on choice: if not every action can be taken at every state: Express feasible actions $\theta$.A, as a logical/boolean indicator for whether $\alpha$ is feasible (in $\theta$.A) depending on values of variables in $\theta$.
$\zeta$ : if continuous variables enter choice value: Specify the distribution of $\zeta$ and the solution method associated with it.
Utility $U()$: Expressed as a vector of values, one number for each row of $\theta$.A. (each value of α). Values depend on current values of state variables in all the vectors, but not ζ unless the model will be solved with reservation values.

Steps

Choose the DDP struct that MyModel is derived from, referred to as DDPparent.

See Methods for the solving and smoothing methods available.

Write the declaration and definition of MyModel and any other derived elements needed in the model.

See Variables to create custom state variables,state blocks and action variables. Decide if you want to use the #import or #include approach to using MyModel in an Ox program.

 #import "MyModel"

requires two separate files: MyModel.h and MyModel.ox

 #include "MyModel.ox"

requires one file MyModel.ox which includes what would be in the header and ox file. You can have a separate MyModel.h file, but you may need to use conditional define directives to avoid multiple inclusions.

The Header Material (to go in a file such as MyModel.h). For each derived element, write a struct declaration.

The .ox Material. For each derived element, define the required and optional methods required of MyModel and its components. Put this material in MyModel.ox.

Write an Ox program that includes or imports the definitions of the elements then builds up the model and solves it

Steps the program should execute in building the model

Call DDPparent::Initailize() for the base of MyModel
Set the model clock with SetClock().

Some DDPs require a new clock variable be created first and sent to Initialize(). For these methods you do not call SetClock; it will be called by Initalize().

Create new instances for the action variables and the state variables in the model.
Add the action and state variables to the model using DP methods such as Actions()
Call CreateSpaces(), sending it a static routine that indicates states are reachable and whether $\Theta$ should be traverse with a loop or a list

Items above are done once while the program runs. They can be repeated only after callingDelete() which disposes of the elements of the previous model. Items below can be done repeatedly during the life of the program once the steps above are done.

Set parameters of the model, including the discount factor $\delta$.

TechAppendix

The generic state space \(\theta\in \Theta\) generalizes to \((\zeta,\epsilon,\eta,\theta,\gamma) \in (Z,E,H,\Theta,\Gamma)\)

The generic action space \(\alpha \in A\) generalizes to \(\alpha \in A(\theta)\).