Rust (1987) studies the dynamic decision making under uncertainty made by Harold Zurcher to replace bus engines. In the decades since, the model has been applied, extended, and used as an example multiple times. This paper resolves some discrepancies in how data were transformed in the original and subsequent archives. Using a package that standardizes computation of estimated dynamic programming it replicates the 12 original maximum likelihood estimates and the six main hypothesis tests of whether Zurcher's decisions were myopic or not. The discrepancy in the data processing results in modest differences in estimates and log-likelihoods, but the p-values are essentially the same because the differences are very similar across values of the discount factor in the tests.
Gauss code and data released in the 1990s are still available (Rust 2000), and several papers that have replicated elements of the original analysis. In particular Larsen et al. (2012) report replications of some original results using Matlab code while testing the validity of assuming the extreme value distribution for choice errors.
Augirreibiria and Mira (2002) use the same data and provide code for their pseudo MLE procedure but do not report replications of Rust's results. Su and Judd (2012) conduct Monte Carlo experiments on their method using Rust's point estimates but do not replicate the estimates nor use the original data. The software package ruspy
(Blesch 2019) implements both the original and Su and Judd solution techniques. Müller and Reich (2021) use the original data and estimate the original model but use a different technique and do not report agreement with the original maximum likelihood estimation.
This paper's modest goal is to replicate MLE estimates in Rust (1987) in order to demonstrate the software package niqlow
described in Ferrall (2022). The replication is based not on bespoke code but rather high-level statements in niqlow
.
In particular, a main statistical inference reported in the paper is a likelihood ratio test of the null hypothesis of myopic behavior in replacing bus engines, which corresponds to estimates setting the discount factor at 0, versus a forward-looking model with the discount factor fixed at 0.9999. That specification requires the nested fixed point solution developed in the model. This replication reproduces both the MLE estimates of the parameter values and the results of these hypothesis tests.
Despite a 35 year gap in computing hardware and software, and a fundamentally different approach to coding the model, the solutions of the DP model are identical for a given set of parameter values. The MLE estimates do differ with the original values because of the difference in the transformation of the original raw data mentioned above. Using the data files in Larsen et al. (2012) an error in the discretization of the data was resolved. As with their analysis, re-estimation with the original data encoding results in identical estimates and test statistics. Using the data produced by the corrected encoding procedure results in different estimates. However, the shifts in likelihood are similar in the myopic and forward-looking versions of the models so test statistics are essentially the same.
niqlow
and differs from the original paper. Table 1 provides shows the difference in notation which is followed so details can be compared directly with other work implemented in niqlow
. In the model, bus maintenance is a stationary environment with a binary decision to replace an engine ($d=1$) each month. This resets mileage $x$ on the engine to $x=0.$ For estimation, the odometer readings are reduced to two different number of discrete possibilities, $N=90$ and $N=175.$ The transition next month takes on one of J values:
$$x^\prime = (1-d)x + j,\ Prob(j=i) = \theta_{3i},\ i=0,\dots,J-1.\label{Trans}$$
The number of feasible jumps required to explain the data was $J=3$ for $N=90$ and $J=5$ for the $N=175.$ Other elements of the model are the monthly discount factor $\delta,$ utility $U(),$ and an implicit vector of additive extreme value shocks to smooth conditional choice probability (CCP).
Panel B of Table 1 summarizes the model using the niqlow
framework. A new class named Zurcher,
is derived from the pre-defined Rust
class. The state variable $x$ is an object of the Renewal
class, following the term Rust used to describe the process. This objected-oriented approach makes it possible to build specialized and extended versions of existing models replacing only the code or features that differ.
A. Crosswalk with Original Notation | |||||
---|---|---|---|---|---|
Term | Original | niqlow | Notes | ||
Discount factor | $\beta$ | $\delta$ | $\delta \in \{0,0.9999\}$ | ||
Mileage bin | $x$ | $x$ | $N=90$ or $175$ | ||
Decision | $i$ | $d$ | Replace or not | ||
Value shock | $\epsilon(i)$ | $\zeta_\alpha$ | Additive Extreme Value | ||
Innovation | $j$ | $j$ | $x^\prime = (1-d)x + j.$ | ||
Parameter vector | $\th$ | $\psi$ | |||
Time index | $t$ | $s$ | For observed outcomes | ||
CCP | $P(i|x,\th)$ | $P^{\star}(\alpha;\th)$ | |||
Transition | $p(x_{t+1}|x_t,i,\theta_3)$ | $P(\th^\prime;\alpha,\th)$ | State-to-state with optimal choice | ||
B. niqlow Definition of the Model | |||||
Element | Value | Category / Details | |||
Clock | $t$ | Ergodic | |||
Value shock | $\zeta$ | ExtremeValue | |||
Actions | $\al$ | $=$ | $(d),$ | Binary Choice | |
States: | $\th$ | $=$ | $(x)$ | Renewal with probabilities $\theta_3$ | |
Utility | $U\l(\al,\th\ \r)$ | $=$ | $-\pmatrix{ \th_1x/1000 \cr RC}$ | | |
Parameters | $\psi$ | $=$ | $\l(\delta,RC,\theta_1,\theta_3\r)$ |
Before turning to the ML estimation procedure, consider output at one of the estimated models in the original paper. Figure 3 in the paper compares $P^\star(1;x),$ the replacement probability, for discount factors of $\delta=0$ and $\delta=0.9999$. This does not involve the data. It is a check on the recoding of the Newton-Kantorovich algorithm, described in Rust (1988), to solve the fixed point in the value function along with the logit specification for the CCPs. 1 displays the original figure and the replication. The additional information in the original figure is the empirical hazard rate. The replication figure is shown alone on the right and is overlayed on the original scanned figure demonstrating they coincide exactly given the scan's resolution.
The transition or innovation $j = x_{s+1}-x_s(1-d_s)$ is constructed from the path of the discretized data $x_s$. The transitions are IID. Both $x$ and $d$ are observed so the transition probabilities $\theta_3$ form a multinomial problem not involving optimal replacement decisions (Rust's first stage estimation). 2 and 3 report the estimates of $\theta_3$ (fractions of transitions $j$) for bus groups 1-3 and group 4 from the original paper for 90 and 175 bins, respectively. The estimated transitions are compared with the values from the replication data.
It is in these transitions that a small discrepancy occurs. Both the original Gauss code and the Matlab code used in Larsen et al. (2012) compute the discrete $x_s$ using the ceiling function when it should be the floor function. For months without a bus replacement this has no effect because when $d_s=0$ the transition $j= x_{s+1}-x_{s}$ is the same regardless if both values are increased by one. However, in replacement months this results in $ j = x_{s+1}$ and this had minimum value 1 instead of 0. (It is not surprising that in months when an engine is replaced the added mileage is within the lowest mileage category which never occurred in the incorrect coding). The first stage transition probabilities are therefore incorrect which affects the second state estimates through $\theta_3$.
These differences are reported in final column labeled $Δ$ in 2 and 3. In the 4 different samples the multinomial likelihood on the transition probabilities differs by roughly 1% between the two codings of the discrete data. Since transition probabilities enter into the problem solved by the agent when determining conditional choices to replace an engine these differences will shift all MLE estimates.
A. Column 1 (Groups 1-3) | |||||
---|---|---|---|---|---|
Replicated | Original | Δ | |||
j | N | % | N | % | |
0 | 1189 | .3077 | 1162 | .3007 | +27 |
1 | 2635 | .6819 | 2662 | .6889 | -27 |
2 | 40 | .0104 | 40 | .0104 | 0 |
Obs | 3864 | 3864 | 0 | ||
lnL | -2592.90 | -2570.97 | 21.93 | ||
B. Column 2 (Group 4) | |||||
Replicated | Original | Δ | |||
j | N | % | N | % | |
0 | 1715 | 0.3996 | 1682 | .3919 | +33 |
1 | 2522 | 0.5876 | 2555 | .5953 | -33 |
2 | 55 | 0.0128 | 55 | .0128 | 0 |
Obs | 4292 | 4292 | 0 | ||
lnL | -3153.83 | -3140.57 | 13.26 | ||
Source of Original: author's calculation of Rust (1987) Table IX using data from Larsen et al. (2012), which match the original. Replicated: Author's calculation from the same data but correcting $j$ after replacements. The third bus group pools A and B and is not reported here. |
Table 3 starts with the same continuous data but uses 175 discrete bins instead of 90. Since the width of the bins is smaller, the maximum jump increases from 2 to 5 bins. Comparing the replication to the original, the differences in likelihoods and cell percentages are somewhat larger than the corresponding values in Table 2. Multiplying the percentage differences by the number observations yields roughly 23 bus-month values are misclassified in each panel.
A. Column 1 (Groups 1-3) | |||||
---|---|---|---|---|---|
Replicated | Original | Δ | |||
j | N | % | N | % | |
0 | 385 | .0976 | 362 | .0937 | +23 |
1 | 1710 | .4425 | 1729 | .4475 | -19 |
2 | 1719 | .4449 | 1723 | .4459 | -4 |
3 | 49 | .0127 | 49 | .0127 | 0 |
4+ | 1 | .0003 | 1 | .0003 | 0 |
Obs | 3864 | 3864 | 0 | ||
lnL | -3896.51 | -3861.00 | 35.51 | ||
B. Column 2 (Group 4) | |||||
Replicated | Original | Δ | |||
j | N | % | N | % | |
0 | 539 | .1256 | 511 | .1191 | +28 |
1 | 2449 | .5706 | 2472 | .5760 | -23 |
2 | 1225 | .2854 | 1230 | .2866 | -5 |
3 | 70 | .0163 | 70 | .0163 | 0 |
4+ | 9 | .0021 | 9 | .0021 | 0 |
Obs | 4292 | 4292 | 0 | ||
lnL | -4371.93 | -4331.72 | 40.21 | ||
See notes to 2, except sources are Table VI and Table X in the original article. |
niqlow
panel data are read in and matched to the states and actions present in the model. This kind of data is automatically categorized as a Full Information because no actions or states are hidden.
A sequence of outcomes for a single bus over $\hat{T}+1$ months is denoted $\{Y\}_{s=0}^{\hat T}$, where $Y_s = \{\al_s,\th_s\}$ is simply the decision and state vectors combined. In niqlow
these are in general vectors, but in this model they reduce to two scalars $(d_s,x_s).$ The likelihood of a single sequence has the form:
$$L\l(\ \hat\psi \, ; \,\{Y\}\ \r)\ =\ {\prod}_{s=0}^{\hat{T}}\quad
\l\{ P^{\,\star}(\al_s; \th_s ) \l[ P\l( \th_{s+1}; \al_s, \th_s\r)\r]^{I\{s\lt \hat{T}\}}\r\}.\label{Lpath}$$
That is, the contribution in observed month $s$ is the CCP for the observed decision times the probability of the observed transition to the state next period. The transition is not observed in the last observed month for a bus. This function is generated automatically by niqlow
from the model and the data. The general form has been specialized to this model for clarity. As in all "structural estimation" the observed data are inserted into the theoretical probabilities produced by the fixed point in the value function. In this model the transition probability is simply the multinomial parameter $\theta_{3j}.$
Equation (14.5) in Rust (1987) differs in two ways from \eqref{Lpath}. Translating terms and counting from 0 instead of 1, that equation would be written $$L^{1987}\l(\ \hat\psi \, ; \,\{Y\}\ \r)\ = \ {\prod}_{s=1}^{\hat{T}}\quad \l\{ P^{\,\star}(\al_s; \th_s )\l[ P\l(\th_{s}; \al_{s-1}, \th_{s-1}\r)\r]\r\}.\label{R14-5}$$ That is, \eqref{R14-5} removes the first choice probability at $s=0.$ The effect is coincidentally negligible. The estimated probability of not replacing an engine in most first observed months is very close to 1.0, because engines are only replaced many months into service. Thus near the maximum the difference between the log-likelihoods \eqref{Lpath} and \eqref{R14-5} is $\approx log(1) = 0.0$.
The second difference is also inconsequential for this case. \eqref{Lpath} multiplies the choice probability at $s$ by the transition to the current state $x_s$ not the transition from current state to the next state. This incorrect pairing is irrelevant since they are interchangeable in \eqref{Lpath}. However, for models with permanent or transitory unobserved states and actions they are not interchangeable. The likelihood should add across unobserved states as well as multiply across time so the conditional choice must be multiplied by the correct transition before summing as in \eqref{Lpath}, which niqlow
does automatically for such models.
niqlow
replicates the solution of the DP problem at estimated parameter values, and noting the discrepancy between the original and replicated samples, now consider second-stage estimation of the structural parameters. In niqlow
this is accomplished using built-in tools applied to the model and the data that apply to any DP model of a general class. Originally the model was estimated in two stages to reduce computation. The replication code does the same using niqlow
's ability to mark estimated parameters as either "transition only" or "utility only." However, only the third-stage estimates, when all parameters are estimated jointly, are reported below.
The ML estimates were reported in original Table IX ($N=90$) and Table X ($N=175$). These models share the linear cost of bus maintenance already shown, and three different bus type samples (Groups 1-3, Group 4, and Groups 1-4 combined). The original estimates and the replications for the first two groups are reported in Tables 3 and 4. The values reported are "third stage" estimates in which all parameters are estimated including the first stage transition probabilities. Standard errors are also reported. The original procedure used BHHH and semi-closed forms for the derivatives of the value function with respect to structural parameters (derived in Rust 1988.) The replication uses BHHH and numerical central step derivatives. For $\delta=0.9999$ the convergence in the stage 3 estimates is weak: the norm of gradients remain large but progress ends. This combines large differences in the value function (which enter exponents) and precision in the convergence in the Newton-Kantorovich.
The likelihood values are within 1% of the original values. This magnitude is consistent with the exact match with choice probabilities for the same parameter values in Figure 1 and the coding discrepancy in the values of $j.$ Also note that the differences in log-likelihoods is approximately the same for $\delta=0$ as for $\delta=0.9999.$ In the former case convergence to the fix point is immediate, so any differences in tolerances or other aspects of the Newton-Kantorovich implementation have no effect on that case. Standard errors are all similar in magnitude between the original and replicated values.
Column 1 Groups 1-3 | Column 2 Group 4 | Column 3 Groups 1-4 | ||||
---|---|---|---|---|---|---|
A.$\delta=0.9999$. | ||||||
Param. | Orig. | Repl. | Orig. | Repl. | Orig. | Repl. |
$\theta_{30}$ | 0.30100 (0.0075) | 0.3077 (0.0068) | 0.39190 (0.0075) | 0.3996 (0.0089) | 0.3489 (0.0052) | 0.3561 (0.0040) |
$\theta_{31}$ | 0.6884 (0.0074) | 0.6819 (0.0066) | 0.5953 (0.0075) | 0.5876 (0.0095) | 0.6394 (0.0053) | 0.6323 (0.0041) |
RC | 11.727 (2.6020) | 11.7270 (2.7377) | 10.075 (1.5820) | 10.0750 (1.6512) | 9.7558 (1.2270) | 9.7558 (1.2361) |
$\theta_1$ | 4.8259 (1.7920) | 4.8259 (1.8380) | 2.293 (0.6390) | 2.2930 (0.6594) | 2.6275 (0.6180) | 2.6275 (0.5782) |
$\ln L$ | -2708.37 | -2725.28 | -3304.16 | -3317.41 | -6055.25 | -6086.07 |
B.$\delta=0$. | ||||||
Param. | Orig. | Repl. | Orig. | Repl. | Orig. | Repl. |
$\theta_{30}$ | 0.30100 (0.0074) | 0.3077 (0.0068) | 0.3919 (0.0075) | 0.3996 (0.0089) | 0.3489 (0.0052) | 0.3561 (0.0040) |
$\theta_{31}$ | 0.68840 (0.0075) | 0.6819 (0.0066) | 0.5953 (0.0075) | 0.5876 (0.0095) | 0.6394 (0.0053) | 0.6323 (0.0041) |
RC | 8.2985 (1.0417) | 8.2985 (1.1257) | 7.6538 (0.7197) | 7.6538 (0.8080) | 7.3055 (0.5067) | 7.3055 (0.5459) |
$\theta_1$ | 109.903 (26.163) | 109.9031 (27.2148) | 71.5133 (13.778) | 71.5133 (15.0571) | 70.2769 (10.750) | 70.2769 (10.5554) |
$\ln L$ | -2710.75 | -2727.66 | -3306.03 | -3319.31 | -6061.64 | -6092.54 |
C. Likelihood ratio test for null of $\delta=0.$ (Myopia Test) | ||||||
LR | 4.76 | 4.76 | 3.74 | 3.79 | 12.78 | 12.94 |
p-value | .029 | .029 | .053 | .052 | .0004 | .0003 |
Source of Original: Rust (1987), Table IX and author's calculation using Newton-Kantorovich and Newton/BHHH optimization. Standard errors reported in parentheses. |
Column 1 Groups 1-X | Column 2 Group 4 | Column 3 Groups 1-4 | ||||
---|---|---|---|---|---|---|
A.$\delta=0.9999$. | ||||||
Param. | Orig. | Repl. | Orig. | Repl. | Orig. | Repl. |
$\theta_{30}$ | 0.0937 (0.005) | 0.0996 (0.0025) | 0.1191 (0.005) | 0.1256 (0.0055) | 0.1071 (0.0034) | 0.1133 (0.0024) |
$\theta_{31}$ | 0.4475 (0.008) | 0.4426 (0.0074) | 0.5762 (0.008) | 0.5706 (0.0072) | 0.5152 (0.0055) | 0.5099 (0.0036) |
$\theta_{30}$ | 0.4459 (0.008) | 0.4449 (0.0077) | 0.2868 (0.007) | 0.2854 (0.0059) | 0.3621 (0.0053) | 0.3609 (0.0028) |
$\theta_{31}$ | 0.0127 (0.002) | 0.0127 (0.0018) | 0.0158 (0.002) | 0.0163 (0.0017) | 0.0143 (0.0013) | 0.0146 (0.0012) |
RC | 11.7257 (2.597) | 11.7361 (2.7486) | 10.896 (1.581) | 10.0982 (1.6645) | 9.7687 (1.226 ) | 9.7802 (1.2251) |
$\theta_1$ | 2.4569 (0.912) | 2.4520 (0.9459) | 1.1732 (0.327) | 1.1702 (0.3355) | 1.3428 (0.315 ) | 1.3410 (0.2933) |
$\ln L$ | -3993.99 | -4029.11 | -4495.14 | -4535.65 | -8607.89 | -8683.84 |
B.$\delta=0$. | ||||||
Param. | Orig. | Repl. | Orig. | Repl. | Orig. | Repl. |
$\theta_{30}$ | 0.0937 (0.005) | 0.0996 (0.0026) | 0.1191 (0.005) | 0.1256 (0.0055) | 0.1071 (0.003) | 0.1133 (0.0024) |
$\theta_{31}$ | 0.4475 (0.008) | 0.4425 (0.0074) | 0.5762 (0.008) | 0.5706 (0.0073) | 0.5152 (0.006) | 0.5099 (0.0036) |
$\theta_{30}$ | 0.4459 (0.008) | 0.4449 (0.0077) | 0.2868 (0.007) | 0.2854 (0.0059) | 0.3621 (0.005) | 0.3610 (0.0028) |
$\theta_{31}$ | 0.0127 (0.002) | 0.0127 (0.0018) | 0.0158 (0.002) | 0.0163 (0.0017) | 0.0143 (0.014) | 0.0146 (0.0012) |
RC | 8.2969 (1.048) | 8.3071 (1.1364) | 7.6423 (0.720) | 7.6483 (0.8451) | 7.3113 (0.507) | 7.3213 (0.5537) |
$\theta_1$ | 56.1657 (13.421) | 56.2896 (14.0936) | 36.6692 (7.068) | 36.7245 (7.9807) | 36.0175 (5.515) | 36.1160 (5.4576) |
$\ln L$ | -3996.35 | -4031.50 | -4497.00 | -4537.52 | -8614.24 | -8690.27 |
C. Likelihood ratio test for null of $\delta=0.$ (Myopia Test) | ||||||
LR | 4.72 | 4.77 | 3.72 | 3.76 | 12.7 | 12.84 |
p-value | .030 | .029 | .054 | .053 | .0004 | .0003 |
See Table 4 except source of original is Rust (1987), Table IX. |
One way to aggregate the overall effect of the data discrepancies and their effect on the estimates is to recompute the likelihood ratio test of the myopia restriction (Panel B versus Panel A in each table, one degree of freedom). The results are reported in Panel C of the tables. The p-values are essentially the same between the original and replicated results. The the myopic and forward-looking versions share the same transition data ($j$) and the effects on the distribution of $j$ is almost a constant shift in log-likelihood resulting in nearly identical LR statistics. This is likely because mileage $x$ and subsequent costs make modest changes month-to-month relative to the large reset when replacement occurs. Shifting the distribution of $j$ has little effect on the replacement dynamics.
Perhaps more important is the fact that the original model and estimation procedure have been independently replicated within a modern open-source platform. This reduces the cost of replication existing empirical DP results, and it makes modifying existing models for both teaching and research purposes substantially easier.
https://sites.google.com/view/victoraguirregabiriaswebsite/computer-code
, (accessed November 2020).
Blesch, Maximilian 2019. Ruspy Software Package, https://ruspy.readthedocs.io/en/latest/index.html
, accessed January 2022.
Ferrall, Christopher 2022. "Object Oriented (Dynamic) Programming: Closing the 'Structural' Estimation Coding Gap," Computational Economics, https://doi.org/10.1007/s10614-022-10280-4.
Larsen, Bradley J. , Florian Oswald, Gregor Reich, and Dan Wunderli 2012. "A test of the extreme value type I assumption in the bus engine replacement model,"
Economics Letters, 116, 2, 213-216.
Müller, Philipp and Gregor Reich 2021. "Harold Zurcher and Paul Volcker: The Protocol of a Meeting that Never Took Place," SSRN Working Paper, October, http://dx.doi.org/10.2139/ssrn.3303999.
niqlow
Documentation. https://ferrall.github.io/niqlow/
, accessed March 2022.
Rust Data Repository. Open Source Economics, https://github.com/OpenSourceEconomics/rust-data
, accessed March 2022.
Su, Che-Lin and Kenneth L. Judd 2012. "Constrained Optimization Approaches To Estimation Of Structural Models," Econometrica, 80, 5, 2213-2230.
Rust, John 1987. "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher", Econometrica 55, 5 (September), 999-1033.
----- 1988. "Maximum Likelihood Estimation Of Discrete Control Processes," SIAM J. Control and Optimization, 26, 5, 1006-24.
----- 2000. "Nested Fixed Point Algorithm Documentation Manual," version 6, https://editorialexpress.com/jrust/nfxp.pdf
(accessed November 2020).
niqlow
version 5, available at https://github.com/ferrall/niqlow/releases/latest
examples/replications
. The relevant code files are:RustEmet1987.h RustEmet1987.ox documentation RustEmet1987mle.h RustEmet1987mle.ox documentation
RustEmet1987readdata2022.ox
from the original ASCII files. Data sets used in the replication are:
RustEmet1987_type1_col1_ALL.dta RustEmet1987_type1_col2_ALL.dta RustEmet1987_type1_col3_ALL.dta
ceil()
are contained in corresponding "type2" files.examples/output
in files of the form R_{N}_{C}_{D}.txt
where $N$ is either 0 (for 90) or 1 (for 175),
C is column 0, 1 or 2, and D is either 0 ($\delta=0.9999$) or 1 ($\delta=0$).