Was Harold Zurcher Myopic After All?
Replicating Rust's Engine Replacement Estimates
 
 
Christopher Ferrall*
Queen's University, Kingston, Canada
ferrallc@queensu.ca
 
May, 2023 [Current]
 
Abstract
Rust (1987) studies the dynamic decision making under uncertainty made by Harold Zurcher to replace bus engines. In the decades since, the model has been applied, extended, and used as an example multiple times. This paper resolves some discrepancies in how data were transformed in the original and subsequent archives. Using a package that standardizes computation of estimated dynamic programming it replicates the 12 original maximum likelihood estimates and the six main hypothesis tests of whether Zurcher's decisions were myopic or not. The discrepancy in the data processing results in modest differences in estimates and log-likelihoods, but the p-values are essentially the same because the differences are very similar across values of the discount factor in the tests.
PDF
Read Here

Introduction

The Rust (1987) model of bus engine replacement is a founding application of empirical dynamic programming (DP). Using data from a city bus garage, the paper studies the decision to replace a bus's engine (or not) during monthly maintenance and inspection. The model continues to be used for teaching because it is simple and can be adapted and extended to other decisions.

Gauss code and data released in the 1990s are still available (Rust 2000), and several papers that have replicated elements of the original analysis. In particular Larsen et al. (2012) report replications of some original results using Matlab code while testing the validity of assuming the extreme value distribution for choice errors. Augirreibiria and Mira (2002) use the same data and provide code for their pseudo MLE procedure but do not report replications of Rust's results. Su and Judd (2012) conduct Monte Carlo experiments on their method using Rust's point estimates but do not replicate the estimates nor use the original data. The software package ruspy (Blesch 2019) implements both the original and Su and Judd solution techniques. Müller and Reich (2021) use the original data and estimate the original model but use a different technique and do not report agreement with the original maximum likelihood estimation.

This paper's modest goal is to replicate MLE estimates in Rust (1987) in order to demonstrate the software package niqlow described in Ferrall (2022). The replication is based not on bespoke code but rather high-level statements in niqlow. In particular, a main statistical inference reported in the paper is a likelihood ratio test of the null hypothesis of myopic behavior in replacing bus engines, which corresponds to estimates setting the discount factor at 0, versus a forward-looking model with the discount factor fixed at 0.9999. That specification requires the nested fixed point solution developed in the model. This replication reproduces both the MLE estimates of the parameter values and the results of these hypothesis tests.

Despite a 35 year gap in computing hardware and software, and a fundamentally different approach to coding the model, the solutions of the DP model are identical for a given set of parameter values. The MLE estimates do differ with the original values because of the difference in the transformation of the original raw data mentioned above. Using the data files in Larsen et al. (2012) an error in the discretization of the data was resolved. As with their analysis, re-estimation with the original data encoding results in identical estimates and test statistics. Using the data produced by the corrected encoding procedure results in different estimates. However, the shifts in likelihood are similar in the myopic and forward-looking versions of the models so test statistics are essentially the same.

Model and Its Solution

The terminology here follows conventions in niqlow and differs from the original paper. Table 1 provides shows the difference in notation which is followed so details can be compared directly with other work implemented in niqlow. In the model, bus maintenance is a stationary environment with a binary decision to replace an engine ($d=1$) each month. This resets mileage $x$ on the engine to $x=0.$ For estimation, the odometer readings are reduced to two different number of discrete possibilities, $N=90$ and $N=175.$ The transition next month takes on one of J values: $$x^\prime = (1-d)x + j,\ Prob(j=i) = \theta_{3i},\ i=0,\dots,J-1.\label{Trans}$$ The number of feasible jumps required to explain the data was $J=3$ for $N=90$ and $J=5$ for the $N=175.$ Other elements of the model are the monthly discount factor $\delta,$ utility $U(),$ and an implicit vector of additive extreme value shocks to smooth conditional choice probability (CCP).

Panel B of Table 1 summarizes the model using the niqlow framework. A new class named Zurcher, is derived from the pre-defined Rust class. The state variable $x$ is an object of the Renewal class, following the term Rust used to describe the process. This objected-oriented approach makes it possible to build specialized and extended versions of existing models replacing only the code or features that differ.

Rust (1987) Model Summary
A. Crosswalk with Original Notation
TermOriginalniqlowNotes
Discount factor$\beta$$\delta$$\delta \in \{0,0.9999\}$
Mileage bin$x$$x$$N=90$ or $175$
Decision$i$$d$Replace or not
Value shock$\epsilon(i)$$\zeta_\alpha$Additive Extreme Value
Innovation$j$$j$ $x^\prime = (1-d)x + j.$
Parameter vector$\th$$\psi$
Time index$t$$s$For observed outcomes
CCP$P(i|x,\th)$$P^{\star}(\alpha;\th)$
Transition$p(x_{t+1}|x_t,i,\theta_3)$$P(\th^\prime;\alpha,\th)$State-to-state with optimal choice
B. niqlow Definition of the Model
ElementValueCategory / Details
Clock$t$Ergodic
Value shock$\zeta$ ExtremeValue
Actions$\al$$=$ $(d),$Binary Choice
States:$\th$$=$$(x)$Renewal with probabilities $\theta_3$
Utility$U\l(\al,\th\ \r)$$=$ $-\pmatrix{ \th_1x/1000 \cr RC}$
Parameters $\psi$$=$ $\l(\delta,RC,\theta_1,\theta_3\r)$

Choice Probability Replication

Before turning to the ML estimation procedure, consider output at one of the estimated models in the original paper. Figure 3 in the paper compares $P^\star(1;x),$ the replacement probability, for discount factors of $\delta=0$ and $\delta=0.9999$. This does not involve the data. It is a check on the recoding of the Newton-Kantorovich algorithm, described in Rust (1988), to solve the fixed point in the value function along with the logit specification for the CCPs. 1 displays the original figure and the replication. The additional information in the original figure is the empirical hazard rate. The replication figure is shown alone on the right and is overlayed on the original scanned figure demonstrating they coincide exactly given the scan's resolution.

 
Replicating Rust Figure 3

Left: original printed Figure 3 digitized with the replicated replacement probability curves overlayed (after matching the axes scales). Right: the replication shown alone.

Data and Estimates Replication

Observed Mileage Transitions

The original article estimates the linear cost specification on the two mileage bin counts described above and 3 different engine groups. The odometer was not reset when an engine was replaced so the raw data is not consistent with the model. For example, an odometer reading of 320,500 miles in month $s$ after a bus replacement at 310,300 miles would be changed to 10,200 miles. This is converted to a discrete value which depends on $N$ and the maximum mileage of 450,000. For $N=90$ it would be changed to $x_s=2$ because it is the third discretized bin of size 5,000 miles. Resetting and discretizing mileage is done by the same Gauss program that estimated the model. Attempts to run the 20-year-old Gauss code and to access formatted files were unsuccessful, so it was not possible to look at the coded mileage data directly. However, the code and generated data files from Larsen et al. (2012) replicates the original code and agrees with it in terms of summary statistics. Another modern source of the continuous and discretized data is the Rust Data repository at OpenSourceEconomics. All versions use the recorded mileage at replacement to determine the month it occurred. The month of replacement is separately recorded and in some cases there is a one month discrepancy between this information.

The transition or innovation $j = x_{s+1}-x_s(1-d_s)$ is constructed from the path of the discretized data $x_s$. The transitions are IID. Both $x$ and $d$ are observed so the transition probabilities $\theta_3$ form a multinomial problem not involving optimal replacement decisions (Rust's first stage estimation). 2 and 3 report the estimates of $\theta_3$ (fractions of transitions $j$) for bus groups 1-3 and group 4 from the original paper for 90 and 175 bins, respectively. The estimated transitions are compared with the values from the replication data.

It is in these transitions that a small discrepancy occurs. Both the original Gauss code and the Matlab code used in Larsen et al. (2012) compute the discrete $x_s$ using the ceiling function when it should be the floor function. For months without a bus replacement this has no effect because when $d_s=0$ the transition $j= x_{s+1}-x_{s}$ is the same regardless if both values are increased by one. However, in replacement months this results in $ j = x_{s+1}$ and this had minimum value 1 instead of 0. (It is not surprising that in months when an engine is replaced the added mileage is within the lowest mileage category which never occurred in the incorrect coding). The first stage transition probabilities are therefore incorrect which affects the second state estimates through $\theta_3$.

These differences are reported in final column labeled $Δ$ in 2 and 3. In the 4 different samples the multinomial likelihood on the transition probabilities differs by roughly 1% between the two codings of the discrete data. Since transition probabilities enter into the problem solved by the agent when determining conditional choices to replace an engine these differences will shift all MLE estimates.

Reconstructing Discretized Mileage Data ($N=90$ bins)
A. Column 1 (Groups 1-3)
ReplicatedOriginalΔ
j N%N%
0 1189.30771162.3007+27
1 2635.68192662.6889-27
2 40.010440.01040
Obs386438640
lnL-2592.90-2570.97 21.93
B. Column 2 (Group 4)
ReplicatedOriginalΔ
j N%N%
0 17150.3996 1682.3919+33
1 25220.58762555.5953-33
2 55 0.0128 55.01280
Obs429242920
lnL-3153.83-3140.5713.26
Source of Original: author's calculation of Rust (1987) Table IX using data from Larsen et al. (2012), which match the original. Replicated: Author's calculation from the same data but correcting $j$ after replacements. The third bus group pools A and B and is not reported here.

Table 3 starts with the same continuous data but uses 175 discrete bins instead of 90. Since the width of the bins is smaller, the maximum jump increases from 2 to 5 bins. Comparing the replication to the original, the differences in likelihoods and cell percentages are somewhat larger than the corresponding values in Table 2. Multiplying the percentage differences by the number observations yields roughly 23 bus-month values are misclassified in each panel.

Reconstructing Discretized Mileage Data ($N=175$ bins)
A. Column 1 (Groups 1-3)
ReplicatedOriginalΔ
j N%N%
0 385 .0976 362 .0937 +23
1 1710 .4425 1729 .4475 -19
2 1719 .4449 1723 .4459 -4
3 49 .0127 49 .0127 0
4+ 1 .0003 1 .00030
Obs 3864 3864 0
lnL -3896.51 -3861.0035.51
B. Column 2 (Group 4)
ReplicatedOriginalΔ
j N%N%
0 539 .1256 511 .1191 +28
1 2449 .5706 2472 .5760 -23
2 1225 .2854 1230 .2866 -5
3 70 .0163 70 .0163 0
4+ 9 .0021 9 .00210
Obs 42924292 0
lnL -4371.93-4331.72 40.21
See notes to 2, except sources are Table VI and Table X in the original article.

Likelihood

The estimated parameter vector is denoted $\hat\psi.$ Let a bus-month observation be denoted $Y\equiv \l(\matrix{d & x}\r).$ The full state and action vectors of the Rust model are both observed. Only the $\zeta$ shock is unobserved since it is integrated out to smooth choice probabilities. In niqlow panel data are read in and matched to the states and actions present in the model. This kind of data is automatically categorized as a Full Information because no actions or states are hidden.

A sequence of outcomes for a single bus over $\hat{T}+1$ months is denoted $\{Y\}_{s=0}^{\hat T}$, where $Y_s = \{\al_s,\th_s\}$ is simply the decision and state vectors combined. In niqlow these are in general vectors, but in this model they reduce to two scalars $(d_s,x_s).$ The likelihood of a single sequence has the form: $$L\l(\ \hat\psi \, ; \,\{Y\}\ \r)\ =\ {\prod}_{s=0}^{\hat{T}}\quad \l\{ P^{\,\star}(\al_s; \th_s ) \l[ P\l( \th_{s+1}; \al_s, \th_s\r)\r]^{I\{s\lt \hat{T}\}}\r\}.\label{Lpath}$$

That is, the contribution in observed month $s$ is the CCP for the observed decision times the probability of the observed transition to the state next period. The transition is not observed in the last observed month for a bus. This function is generated automatically by niqlow from the model and the data. The general form has been specialized to this model for clarity. As in all "structural estimation" the observed data are inserted into the theoretical probabilities produced by the fixed point in the value function. In this model the transition probability is simply the multinomial parameter $\theta_{3j}.$

Equation (14.5) in Rust (1987) differs in two ways from \eqref{Lpath}. Translating terms and counting from 0 instead of 1, that equation would be written $$L^{1987}\l(\ \hat\psi \, ; \,\{Y\}\ \r)\ = \ {\prod}_{s=1}^{\hat{T}}\quad \l\{ P^{\,\star}(\al_s; \th_s )\l[ P\l(\th_{s}; \al_{s-1}, \th_{s-1}\r)\r]\r\}.\label{R14-5}$$ That is, \eqref{R14-5} removes the first choice probability at $s=0.$ The effect is coincidentally negligible. The estimated probability of not replacing an engine in most first observed months is very close to 1.0, because engines are only replaced many months into service. Thus near the maximum the difference between the log-likelihoods \eqref{Lpath} and \eqref{R14-5} is $\approx log(1) = 0.0$.

The second difference is also inconsequential for this case. \eqref{Lpath} multiplies the choice probability at $s$ by the transition to the current state $x_s$ not the transition from current state to the next state. This incorrect pairing is irrelevant since they are interchangeable in \eqref{Lpath}. However, for models with permanent or transitory unobserved states and actions they are not interchangeable. The likelihood should add across unobserved states as well as multiply across time so the conditional choice must be multiplied by the correct transition before summing as in \eqref{Lpath}, which niqlow does automatically for such models.

Estimation

Having verified from 1 that niqlow replicates the solution of the DP problem at estimated parameter values, and noting the discrepancy between the original and replicated samples, now consider second-stage estimation of the structural parameters. In niqlow this is accomplished using built-in tools applied to the model and the data that apply to any DP model of a general class. Originally the model was estimated in two stages to reduce computation. The replication code does the same using niqlow's ability to mark estimated parameters as either "transition only" or "utility only." However, only the third-stage estimates, when all parameters are estimated jointly, are reported below.

The ML estimates were reported in original Table IX ($N=90$) and Table X ($N=175$). These models share the linear cost of bus maintenance already shown, and three different bus type samples (Groups 1-3, Group 4, and Groups 1-4 combined). The original estimates and the replications for the first two groups are reported in Tables 3 and 4. The values reported are "third stage" estimates in which all parameters are estimated including the first stage transition probabilities. Standard errors are also reported. The original procedure used BHHH and semi-closed forms for the derivatives of the value function with respect to structural parameters (derived in Rust 1988.) The replication uses BHHH and numerical central step derivatives. For $\delta=0.9999$ the convergence in the stage 3 estimates is weak: the norm of gradients remain large but progress ends. This combines large differences in the value function (which enter exponents) and precision in the convergence in the Newton-Kantorovich.

The likelihood values are within 1% of the original values. This magnitude is consistent with the exact match with choice probabilities for the same parameter values in Figure 1 and the coding discrepancy in the values of $j.$ Also note that the differences in log-likelihoods is approximately the same for $\delta=0$ as for $\delta=0.9999.$ In the former case convergence to the fix point is immediate, so any differences in tolerances or other aspects of the Newton-Kantorovich implementation have no effect on that case. Standard errors are all similar in magnitude between the original and replicated values.

Replicating Table IX of Rust 1987 (3rd Stage FIML)
Column 1
Groups 1-3
Column 2
Group 4
Column 3
Groups 1-4
A.$\delta=0.9999$.
Param. Orig. Repl. Orig. Repl. Orig. Repl.
$\theta_{30}$ 0.30100
(0.0075)
0.3077
(0.0068)
0.39190
(0.0075)
0.3996
(0.0089)
0.3489
(0.0052)
0.3561
(0.0040)
$\theta_{31}$0.6884
(0.0074)
0.6819
(0.0066)
0.5953
(0.0075)
0.5876
(0.0095)
0.6394
(0.0053)
0.6323
(0.0041)
RC 11.727
(2.6020)
11.7270
(2.7377)
10.075
(1.5820)
10.0750
(1.6512)
9.7558
(1.2270)
9.7558
(1.2361)
$\theta_1$ 4.8259
(1.7920)
4.8259
(1.8380)
2.293
(0.6390)
2.2930
(0.6594)
2.6275
(0.6180)
2.6275
(0.5782)

$\ln L$ -2708.37 -2725.28 -3304.16 -3317.41 -6055.25 -6086.07
B.$\delta=0$.
Param. Orig. Repl. Orig. Repl. Orig. Repl.
$\theta_{30}$ 0.30100
(0.0074)
0.3077
(0.0068)
0.3919
(0.0075)
0.3996
(0.0089)
0.3489
(0.0052)
0.3561
(0.0040)
$\theta_{31}$ 0.68840
(0.0075)
0.6819
(0.0066)
0.5953
(0.0075)
0.5876
(0.0095)
0.6394
(0.0053)
0.6323
(0.0041)
RC 8.2985
(1.0417)
8.2985
(1.1257)
7.6538
(0.7197)
7.6538
(0.8080)
7.3055
(0.5067)
7.3055
(0.5459)
$\theta_1$ 109.903
(26.163)
109.9031
(27.2148)
71.5133
(13.778)
71.5133
(15.0571)
70.2769
(10.750)
70.2769
(10.5554)

$\ln L$ -2710.75 -2727.66 -3306.03 -3319.31 -6061.64 -6092.54
C. Likelihood ratio test for null of $\delta=0.$ (Myopia Test)
LR4.764.763.743.7912.7812.94
p-value.029.029.053.052.0004.0003
Source of Original: Rust (1987), Table IX and author's calculation using Newton-Kantorovich and Newton/BHHH optimization. Standard errors reported in parentheses.

Replicating Table X of Rust 1987 (3rd Stage FIML)
Column 1
Groups 1-X
Column 2
Group 4
Column 3
Groups 1-4
A.$\delta=0.9999$.
Param.Orig.Repl. Orig.Repl.Orig.Repl.
$\theta_{30}$0.0937
(0.005)
0.0996
(0.0025)
0.1191
(0.005)
0.1256
(0.0055)
0.1071
(0.0034)
0.1133
(0.0024)
$\theta_{31}$0.4475
(0.008)
0.4426
(0.0074)
0.5762
(0.008)
0.5706
(0.0072)
0.5152
(0.0055)
0.5099
(0.0036)
$\theta_{30}$0.4459
(0.008)
0.4449
(0.0077)
0.2868
(0.007)
0.2854
(0.0059)
0.3621
(0.0053)
0.3609
(0.0028)
$\theta_{31}$0.0127
(0.002)
0.0127
(0.0018)
0.0158
(0.002)
0.0163
(0.0017)
0.0143
(0.0013)
0.0146
(0.0012)
RC 11.7257
(2.597)
11.7361
(2.7486)
10.896
(1.581)
10.0982
(1.6645)
9.7687
(1.226 )
9.7802
(1.2251)
$\theta_1$ 2.4569
(0.912)
2.4520
(0.9459)
1.1732
(0.327)
1.1702
(0.3355)
1.3428
(0.315 )
1.3410
(0.2933)

$\ln L$ -3993.99-4029.11-4495.14-4535.65-8607.89-8683.84
B.$\delta=0$.
Param.Orig.Repl. Orig.Repl.Orig.Repl.
$\theta_{30}$0.0937
(0.005)
0.0996
(0.0026)
0.1191
(0.005)
0.1256
(0.0055)
0.1071
(0.003)
0.1133
(0.0024)
$\theta_{31}$0.4475
(0.008)
0.4425
(0.0074)
0.5762
(0.008)
0.5706
(0.0073)
0.5152
(0.006)
0.5099
(0.0036)
$\theta_{30}$0.4459
(0.008)
0.4449
(0.0077)
0.2868
(0.007)
0.2854
(0.0059)
0.3621
(0.005)
0.3610
(0.0028)
$\theta_{31}$0.0127
(0.002)
0.0127
(0.0018)
0.0158
(0.002)
0.0163
(0.0017)
0.0143
(0.014)
0.0146
(0.0012)
RC 8.2969
(1.048)
8.3071
(1.1364)
7.6423
(0.720)
7.6483
(0.8451)
7.3113
(0.507)
7.3213
(0.5537)
$\theta_1$ 56.1657
(13.421)
56.2896
(14.0936)
36.6692
(7.068)
36.7245
(7.9807)
36.0175
(5.515)
36.1160
(5.4576)

$\ln L$ -3996.35-4031.50-4497.00-4537.52-8614.24-8690.27
C. Likelihood ratio test for null of $\delta=0.$ (Myopia Test)
LR4.724.773.723.7612.712.84
p-value.030.029.054.053.0004.0003
See Table 4 except source of original is Rust (1987), Table IX.

Inference: Replicating the Myopia Test

One way to aggregate the overall effect of the data discrepancies and their effect on the estimates is to recompute the likelihood ratio test of the myopia restriction (Panel B versus Panel A in each table, one degree of freedom). The results are reported in Panel C of the tables. The p-values are essentially the same between the original and replicated results. The the myopic and forward-looking versions share the same transition data ($j$) and the effects on the distribution of $j$ is almost a constant shift in log-likelihood resulting in nearly identical LR statistics. This is likely because mileage $x$ and subsequent costs make modest changes month-to-month relative to the large reset when replacement occurs. Shifting the distribution of $j$ has little effect on the replacement dynamics.

Conclusion

Despite some discrepancies in the original analysis, a complete re-analysis of the main estimates from Rust (1987) are replicated. The claim of "clear rejection" of myopic decision making in the original paper survives the effect of these changes

Perhaps more important is the fact that the original model and estimation procedure have been independently replicated within a modern open-source platform. This reduces the cost of replication existing empirical DP results, and it makes modifying existing models for both teaching and research purposes substantially easier.

Acknowledgements

I thank Brad Larsen for providing his code and data files, John Rust for his responses to many inquiries, and Victor Aguirreibiria for suggestions on an earlier version.

References

Aguirregabiria, Victor and Pedro Mira 2002. "Swapping The Nested Fixed Point Algorithm: A Class Of Estimators For Discrete Markov Decision Models," Econometrica 70, 4, 1519-43. ---- 2010. "Dynamic Discrete Choice Structural Models: A Survey," Journal of Econometrics, 156, 1 (May), 38-67. Data archive: https://sites.google.com/view/victoraguirregabiriaswebsite/computer-code, (accessed November 2020). Blesch, Maximilian 2019. Ruspy Software Package, https://ruspy.readthedocs.io/en/latest/index.html, accessed January 2022. Ferrall, Christopher 2022. "Object Oriented (Dynamic) Programming: Closing the 'Structural' Estimation Coding Gap," Computational Economics, https://doi.org/10.1007/s10614-022-10280-4. Larsen, Bradley J. , Florian Oswald, Gregor Reich, and Dan Wunderli 2012. "A test of the extreme value type I assumption in the bus engine replacement model," Economics Letters, 116, 2, 213-216. Müller, Philipp and Gregor Reich 2021. "Harold Zurcher and Paul Volcker: The Protocol of a Meeting that Never Took Place," SSRN Working Paper, October, http://dx.doi.org/10.2139/ssrn.3303999. niqlow Documentation. https://ferrall.github.io/niqlow/, accessed March 2022. Rust Data Repository. Open Source Economics, https://github.com/OpenSourceEconomics/rust-data, accessed March 2022. Su, Che-Lin and Kenneth L. Judd 2012. "Constrained Optimization Approaches To Estimation Of Structural Models," Econometrica, 80, 5, 2213-2230. Rust, John 1987. "Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher", Econometrica 55, 5 (September), 999-1033. ----- 1988. "Maximum Likelihood Estimation Of Discrete Control Processes," SIAM J. Control and Optimization, 26, 5, 1006-24. ----- 2000. "Nested Fixed Point Algorithm Documentation Manual," version 6, https://editorialexpress.com/jrust/nfxp.pdf (accessed November 2020).

Supplemental Material