C<sup>4</sup>E: Computation For Economists

Applications

Monte Carlo Experiments

NR 7.6

One way to use Monte Carlo is to demonstrate the implications of statistical results like Central Limit Theorems. Typically a CLT will say that under some conditions some sort of average of random variables will follow the normal distribution as the sample size goes to $\infty$ . Let $X$ be a random variable with distribution $F(x)$ that has a mean $\mu$ and variance $\sigma^2.$ The sample mean of $N$ draws of $X$ is, of course, $\bar{X}_N = {1\over N}\sum_{i=1}^N X_i.$ The notation includes the sample size in the definition of the mean since a CLT concerns what happens as $N$ gets large. We know that $E[\bar{X}_N]=\mu$ and $Var(\bar{X}_N) = \sigma^2/N$ for any sample size $N$ as long as the observations are independent. (Otherwise the variance of the mean would have to account for the covariance across observations.) So in the IID case the variance of the mean goes to $0$ as $N\to\infty.$ So ITS distribution "collapses" to a point at $\mu$ which is not the normal distribution.

A CLT will need to "blow" up the distribution to keep it from collapsing. The way to do that, in this simple case, is to look at the distribution of $Y_N = \sqrt{N}(\bar{X}_N-\mu).\label{YN}$ Now we know that $E[Y_N] = 0$ and $Var[Y_N]= \sigma^2.$ That is, the mean and variance of the scaled deviations from the mean do not depend on the sample size. The factor $\sqrt{N}$ is the rate at which the distribution converges. That rate can differ if the observations are not identical in distribution and/or sampling is not indepdent.

Then the CLT says $\lim_{N\to\infty} Y_N \to {\cal N}(0,\sigma^2).$ What this says it says in practice is that for a large but finite $N$ the sample mean is normally distributed with mean $\mu$ and variance $\sigma^2/N.$ But a CLT is usually stated in terms of scaled deviation like $Y_N.$

It doesn't matter what $F(x)$ looks like (within some technical limits that in some cases might not hold). There are different CLTs for whether the draws of $X$ are correlated or not and whether the distributions are identical or changing over $i$ and many other nuances.

Be careful about interpreting this result. For any given sample, the sample mean is simply a number and has no distribution. A CLT says that if you repeatedly draw samples to get different sample means they will approximately follow the normal distribution for "large" $N.$ . For $N=1$ the sample mean is just the random variable, so unless $X$ itself a normal random variable the distribution won't look anything like a normal. But as you start increase $N$ the distribution looks more and more normal.

Monte Carlo Integration of a Function

Algorithm 23. Use Monte Carlo to Approximate an Integral

Given

a random vector,

$x\in \Re^n$ ,

$x\sim F(x)$

a function

$g(x)$ to approximate the average (expected value) with respect to the distribution

$\hat S\ \approx\ S\ =\ \int g(x)f(x)dx.$

A number of replications

$R$ or a desired precision of the approximation.

Draw a pseudo-random $x^r_1 \sim F(x)$ and define $S_1 \equiv g(x^r_1)$ .
Then $E[S_1] = \int g(x^r)f(x^r)dx^r = S$ . Since this is independent of the dimension $n$ , Monte Carlo integration breaks the curse of dimensionality on average.
Improve the precision of the estimate by repeatedly drawing $R$ independent values, $\hat S_R = {1\over R}{\sum}_{i=1}^R S_i.$

Simulated Annealing

NR 10.9

$x^\star_V$

$V$

$x^\star$

$x^\star_V$

$x^d$

current vector

$x_t$

^th

$x^d$

$x_{t+1}$

$p_t$

annealing

$p_t \to 0$

$t\to\infty$

simulated

$x^d$

temperature

$t$

$\delta \equiv f^d - f_t.$

$\delta>0$

$P(\delta,T) = 1$

$0\le P(\delta,T) \le 1$

$T\ge 0$

$\delta$

$P()$

${\partial P(\delta,T) \over \partial\delta} \ge 0$

${\partial P(\delta,T) \over \partial T} \ge 0$

$T(t)$

Algorithm 24. Simulated Annealing

Given

$f(x)$ , and

$x_0\in \Re^n$

a cool down schedule

$T(t)$

a probability function

$P(\delta,T)$ .

an iteration max

$t_{max}$ .

Set $t=0$ and $f_0 = f(x_0)$ .
→ Generate a new vector $x^d$ and compute $f^d = f(x^d)$ .
$x_{t+1} = \cases{ x_d & with prob. $P\left(f^d-f_t,T(t)\right)$\cr x_t & otherwise.\cr}$ Set $f_{t+1} = f(x_{t+1})$ .
Increment $t$ . Stop if $t=t_{max}$ ; Otherwise, return to →.

Exercises

Simulate the distribution of different things computed from sample data (also know as statistics). Let X follow the \chi^2_3 distribution (which is the "chi-squared" distribution with 3 degrees of freedom). See its wikipedia page to discover the mean of this distribution and an explanation. Note: ranchi() is in Ox's oxprob package. Remember that the ran?() functions in Ox will return a matrix. So you can draw a N\times R matrix. Each column of that matrix is a sample (each row is observation i). Then just compute the statistics for each column. Graph denisities or histograms for samples of size N=10, N=30 and N=200. In each case, do R=1000 replications for each sample size. This takes only a few lines of Ox code.
1. Simulate Y_N in \eqref{YN} which is the scaled deviation from the population mean. Remember you can use built-in functions and operators to compute the mean of each column of a matrix. Here is the Ox graphing function that will do that if Y contains a 1\times R row matrix:
```
    #include 

       // Draw simulated sample
       // Compute values of YN in Y

    	DrawDensity(0, Y, "$Y_N$", 1, 1,1);
	SaveDrawWindow("YN.pdf");
```
  The last 3 1's are optional but should make the graph look the way you want ("Graphic Reference" describes the Ox drawing routines).
2. Not everything we do with sample data follows a normal distribution in large samples even with IID sampling. One example is what are called extreme statistics (whereas the mean is a central statistic). For example, for a sample of $X_i$ of size $N$ define as the largest value in the sample: $X^{max}_N \equiv \max_i\ X_i$ Using the same simulated sample of $\chi^2_3$ , graph $X^{max}_N$ and note whether it "looks normal" for large $N$
When replicating Stata's multinomial logit output the main equations were: v_{ij} = x_i \beta_j be the deterministic component and \epsilon_{ij} be the ran u_{ij} = v_{ij} + \epsilon_{ij}. We observe i choosing the option that maximizes utility the j^\star_i = \arg\max_{j\in 0,1,\dots,J-1}\ u_{ij}. We took as given that the probability y_i equals to the optimal choice was: Prob(y_i) = { e^{v_{y_i}} \over \sum_j e^{v_j}}. That is true only if the error terms \epsilon_{ij} follow the Type I Extreme Value Distribution (also known as the Gumbel Distribution). Look up its wikipedia page. To draw a N x J matrix use ranextremevalue(N,J,0,1) .
1. Use the loaddemo() code to read in the $X$ matrix from the Stata sample. Instead of using the 3 outcome $y$ vector from the data, simulated new $y$ vectors multiple times using Monte Carlo. To do this, you need to have the "true" or population parameter vector, often denoted with a "0," so here $\theta_0.$ Set $\theta_0$ equal to the MLE estimates from the actual data: $\theta_0 = \hat\theta^{mle}.$ Write a function to simulate a sample of $y$ using the same $X$ matrix and $\theta_0.$ THat is compute the simulated $u_{ij}$ and compute $j^\star_i.$ That means you have to store the index of the element in a row or column vector that holds the max (in this case the choice with the highest utility for observation $j$ ). Ox has a maxcindex() function to do this (or use loops). Note it does not provide a maxrindex() so you may have to transpose your matrix to use it.
2. For a simulated $y$ vector re-estimate the coefficients using MaxBFGS() as in the previous assignment. Then put this code into a loop to estimate from $R$ simulated samples.
3. You can put all the simulated results in a matrix that is $2K \times R.$ Each column is a simulated MLE vector. Compute the distribution of the estimates and compare them to $\theta_0$ . (The moments() function is a good option for this.)