C<sup>4</sup>E: Computation For Economists

The Style Council^♭Paul Weller's group after the Jam:

Background

Oscar and Felix: prototype coders

Mr. E.

main()

func1()

Ablsx()

Ms. P.

main()

Initialize()

SolveModel()

FindEquilibrium()

CheckObservation()

most people who do a lot of programming cycle back and forth between these two styles.

The Eye of the Camel

The Principle: use meaningful names for variables and functions that reflect their purpose, to make your code clearer to you and others.

FindEquilibrium

CamelCase

Pascal

Hungarian notation

i, x, g(y)

year, income, cost(output0

I Call Your Name

The Principle: Name constants in your code and protect them from accidental changes

Pi

I2

N

3

4

3

initialized

decl Pi = 3.1417;

RT0

Pi

Pi = 5

Hi = 5

Pi

RunTime

RT0

const

const decl Pi = 3.1416;

Pi

Pi = 5

enum{}

enum

enumerate

Enumerations

    enum{Zero,One,Two}
    enum{Asia,Africa,Europe,NorthAmerica,SouthAmerica,Australia,Antarctica,Ncontinents}
    enum{Three=3,NegThree=-3,NegTwo,Six=Three+Three}

Without any =, an enumeration is just the numbers 0,1,2,.... The identifer is the new alias for that integer. Fhe first enumeration says that in this program Zero ≡ 0, One ≡ 1, and Two ≡ 2. The Ox compiler will not let your code change the value. (No memory cell is used to store Zero so it cannot be changed during execution.) It appears as its integer value in the object code. In Ox, an enumerated value is not a left-object because it is impossible to put it on the left side of the assignment operator =.

The second enumeration lists the seven continents and assigns each an integer code, starting with 0 for Asia. So in this program the number 0 has two names: Zero and Asia. Antarctica gets a value of 6. A convenient way to define the number of items enumerated is to add a final enumeration, so Ncontinents ≡ 7. Now, if later the code is modified to define India as a continent of its own, the tag India can be put in the list and the number of continents will become 8 automatically (once the code is recompiled). This can be a very handy way to label the elements of a vector. For example, the specification of the X variables in a regression can be an enumeration:

enum{Cons,Gender,Age,AgeSquare,Ncoeff}

beta

Ncoeff

beta[Age]

3,-3,-2,6

NegTwo

enum

=

Space Oddity ^♭David Bowie (1969)

The Principle: use indentation and lines in your code to make its structure clear to you and other humans.

VERSION 1

f(x){return x*x;}main(){ decl i,s;s=0;for(i=1;i<=10;++i)s=s+f(i);println("Sum of squares = ",s);}

VERSION 2

f(x) {
return x*x;}
main() {
decl i, s;
s=0;
for(i=1;i<=10;++i) {
s=s+f(i);}
println("Sum of squares = ",s);}

f(x) {
    return x*x;
    }
main() {
    decl i, s;
    s=0;
    for(i=1;i<=10;++i) {
        s=s+f(i);
        }
    println("Sum of squares = ",s);
    }

indented

main()

indentation styles

Ratliff Style

f(x) {
    return x*x;
    }

Tabs vs. Spaces

Example of using naming, indentation and comments to make your code readable to a human. When asked to submit code for major assignments you should emulate these ideas or follow another guide that you have been taught. Most of the code I provide to you, and the code that you submit for practice problems does not bother with some of these details. That's because the code is very simple and the context is clear from the course. But when working in a team or contributing to a project these ideas are important.

I'm Bad, I'm Nationwide^♭ZZ Top 1970s

The Principle: use global identifiers sparingly

Run Time Environment

RTE

main()

main

.ox

main()

declared

main()

#include "oxstd.h"

main()

the globe

global

decl

oxvalue

RT0

f(); declares f to be a function. That does not say what it does, merely that it exists on the globe. The function is defined by placing the statements that it will execute inside curly brackets:

f();                        // f declared

g() {                       // g defined and declard
   println("I am g");
   }

f() {                       // f defined
  println("I am f");
  }

Lines in the source code do not really mean anything.

A statement may span many lines, and a single line can contain several statements. What matters is the semi-colon ; that ends a statement and the right curly bracket } that ends a group of statements started with {.

New lines do matter for one thing: comments that start with //.

Anything after // on the line will be ignored by the compiler. You can put explanatory notes there to help a human reader understand the code. The comment ends when the next line in the source code begins. Comments within a line and across lines are delimited by /* */.

Functions do not have to be declared separately.

If the definition appears then that will declare the identifier. One reason for separating the definition and declaration has already be seen. The .h file can contain only declarations. The definitions can then be linked in from a separate file. This can get a little tricky when your program has more than one source file, so a whole section below is devoted to this.

local

declared

{ }

This code will create a compiler error:

 1          decl v;
 2          f() {
 3              v = 5;
 4              y = 6;
 5              }
 6          decl y;
 7          g() {
 8              y = 7;
 9              }
10          main() { }

v

y

after

f()

The error looks something like:

new06.ox (4): 'y' undeclared identifier
new06.ox (4): 'y' left-value expected (need storage object)
Ox reports errors: exit code= 1!!

g()

y

g()

19-ping-pong.ox
 1:    #include "oxstd.h"        
 2:    ping();
 3:    pong() {
 4:    	println("pong");      
 5:    	ping();
 6:        }                     
 7:    ping()  {
 8:    	print("ping-");
 9:    	pong();
10:    	}
11:    main() {
12:    	ping();
13:    	}

ping()

pong()

ping()

pong()

Call Me^♭Al Green 1974: functions and their arguments

The Principle: Use functions to avoid duplicate code and to protect data.

Formal and Actual Parameters

20-arguments.ox
 1:    #include "oxstd.h"
 2:    cobb(x,y,aU);
 3:    
 4:    main() {
 5:        decl two = 2.0,u=0.0, ok;
 6:        ok = cobb(two,2.5,&u);		//address of u
 7:        println("Output: ",ok," ",u);
 8:        }
 9:    
10:    cobb(x,y,aU) {
11:    	if (x < 0 || y < 0 )
12:    		return 0;
13:    	aU[0] = x^0.2 * y^0.8;
14:    	return 1;
15:        }
16:

--------------- Ox at 15:30:31 on 18-Sep-2012 ---------------

Ox Console version 6.21 (Windows/U) (C) J.A. Doornik, 1994-2011
This version may be used for academic research and teaching only
b 7
Give me five: 5
What is d? -5

formal

actual

b

five()

two

b

Fast, Flexible, Portable

Rule of Thumb #13. Learn a fast language, such as C or Fortran.
Use Google to find author

We have already discussed the first step required to understand this issue: C/FORTRAN are compiled languages. Ox/Matlab/R/Python are interpreted languages. Their interpreters are programs written in a compiled language, so they derive their efficiency in computing from a program written in another language by another person (or group of people). This can make interpreted language appear slower than compiled ones, but it depends on several factors.

The previous chapter also discussed another aspect of speed. A proper accounting for the time taken by the programming cycle includes not just how long RunTime takes, but all the time from an idea to a finished result. In some cases this may be primarily RunTime. If the problem is very similar to things you have already programmed (or someone else has programmed it), so you can use a canned package. If it involves a lot of computation, then you only care about how long production runs take. An example in econometrics is a Monte Carlo or bootstrap computation on a simple data generating process.

Otherwise, a full accounting for speed includes time spent coding (and debugging) as well as RunTIme. Rule of Thumb #13 suggests it is better to pay the upfront cost of learning a compiled language. Having paid that upfront cost you will not later on be stuck with high marginal cost of a seemingly slow derived language during your production run. But everyone would agree that there is a limit to that argument. Why stop with a language like C? Why not learn to program in assembly language to wring out even more inefficiency from your code? No one advocates that solution because the fixed costs swamp the reduction in marginal cost, but in many cases decreasing returns kick in to rule out using compiled languages.

Another way to pay an overhead cost to avoid high marginal costs later on is to study this chapter and learn the lessons in it. For most problems these lessons allow you to have the best of both worlds: use of a convenient interpreted language that avoids pitfalls into glacial execution times.

Flexible Flyer^♭Husker Du 1985: Write Code that can be reused and relied upon

As a young economist how will you use the computer to do your research? This is difficult to answer, but here are some extreme cases. First, you may end up using the same framework over and over again. In this case it makes sense to write code specific to that task and to optimize its execution speed. That is, you plan to stay in CodingTime for a short time. If you spend two weeks optimizing the code and use it over and over again you might save yourself months of time waiting for the code to finish.

On the other hand, you might end up working on many different kinds of models requiring very different kinds of computation. In this case you will end up spending a long time in CodingTime. You will not find it worthwhile to squeeze out all the computational inefficiency in each project because it will only enter ProductionTime once. Two weeks optimizing the code may save you two hours of computing time.

Most people fall in between these extremes. As a researcher you will probably adopt one or two approaches (paradigms, frameworks, etc) to answer economic questions. But within your chosen approaches you will be continually changing the details of your model, and therefore your code. In this case you would like to balance the concern for computational efficiency with programming flexibility. Ideally you program the shared aspects of your models once and efficiently, but it should be done in such a way that making changes does not require going back to the beginning. One important reason to avoid this is that every time you change your code there is a risk that new bugs are introduced. But if you can reuse the same basic code and modify only details you can limit the chances that changes cause errors.

Perhaps the best way to follow this strategy is to use object oriented programming. Later in this chapter OOP will be introduced and emphasized. One very important by-product of good OOP code is that your code can be shared with other people with some chance that they can modify and expand it easily, at least easier than if they have to tweak all your code to do something different.

Let $f(x)$ be a function of a single real number $x$. We can write this more formally as $f:\Re\to\Re$. To begin, we want to use the cube-root of $x$: $$f(x) \equiv x^{1\over 3}.\tag{F1}\label{F1}$$ A symbolic function can be represented on a computer as a sequence of instructions to be applied to an input value. The steps are machine instructions in a computed language, and pseudo-instructions in an interpreted one. The program that works with $f(x)$ has to be able to run these instructions from different places in the code and to send different values of $x$. Finally, the value of $f(x)$ computed by the instructions has be retrievable.

In Ox and all other languages, you can compute $x^{1/3}$ inline: just x^(1/3). If we are translating a symbolic model into a computer program then using inline operators has a potential drawback. If we wish to change $f(x)$ from the cube root of $x$ to something else, but otherwise leave the rest of the model the same, we have to find every use of it in the code and replace it.

Inline coding of a named function is a bad idea

-----------------------------------------------------------------
    x = 5.0;
    ...
    y = x^(1/3) - z;
    ...
    z = 25 -3*y^(1/2);   // is this the square root of y, or a typo??
    ...

Use functions to code functions.

   Version 1                  Version 2               Version 3
----------------------------------------------------------------------
f(x) {                     cubert(x) {              f(x,av) {
   return x^(1/3);            return x^(1/3);        av[0] = x^(1/3);
    }                         }                      return TRUE;
                                                     }
...                        ...                      ...
y = f(8.0);                y = cubert(8.0);         ok = f(8.0,&y);

0.001^1/3

0.1

0.001

The first two versions are almost identical, and both allow the program that uses the function to assign the value of the function to a variable. So in both cases y would hold the value 2.0 in after the line is executed. The routine is passed $x$ stored as a FPR and returns another FPR as a numerical approximation to $f(x)$. The only difference between Version 1 and Version 2 is the name we give function: the generic f() or the specific cubert.

Which kind of name to use depends on the role $f(x)$ plays in your model. If $x^{1/3}$ is really fundamental to your model, in the sense that it makes no sense to use another function, then use a name like cubert(). In essence, you are "binding" the specific name to the function because they go together. However, if $x^{1/3}$ is just a convenient form and you might want to solve your model for other forms of $f(x)$, then the generic name is better. That way, if down the road you switch to, say, $f(x)=\log x$, the reference in the rest of the code to f() is still clear, whereas the specific name cubert would now be misleading.

One difference between the first two is that the name cubert() is more descriptive. It says what function it is (the cube root of its argument). But that is not necessarily better. What if we later decided to study $x^{1/4}$ with the same program? In the first case the name $f(x)$ is not misleading and we simply change it to return $x^{1/4}$. But cubert() becomes a misleading name, which may end up with errors if the change in the exponent is forgotten.

Thus, which name is better depends on the context. If the main program is meant to deal with an arbitrary function then f(x) is better because it is generic. But if the function is going into a library to be used across many programs, and it was always be $x^3$, then the name cube() is better.

Version 3 is different. Now the value of the function is returned to the main program through a second argument, av. As discussed earlier, output arguments in Ox must be addresses. So the function's code assigns $x^3$ to the variable pointed to by av. The return value is still set, in this case to TRUE or 1. This is closer to the way that built-in Ox procedures want functions to be defined. The reason is that the function can tell the program whether the function evaluation can be trusted. TRUE is returned because built-in Ox procedures typically expect the user's function to return 1 (TRUE) if everything is fine and 0 if the evaluation of the function failed.

For example, consider a different root: $(x)^{1\over 4}$. This function is not defined (as a real function) if $x=-1$. We could write our code to simply kill the program based on a numerical exception. But it may be preferred to let the program continue to run but deal with the fact that the function value was undefined.

Quad Root Code

cubert(x,av) {
   if (x >= 0) {
      av[0] = x^(1/4);
      return TRUE;
      }
   av[0] = .NaN;
   return FALSE;
   }
⋮
if (f(x,&y))
   //proceed, y holds x^(1/4)
else
   // deal with negative x, y holds .NaN

x

cubert()

av[0]

FALSE

valid argument flag

value of the function

Roots

   Version 4             Version 5           Version 6
--------------------------------------------------------
const decl n=3.0;        decl n;             g(x,n=3.0) {
g(x) {                   g(x) {                 return x^(1/n);
  return x^(1/n);          return x^(1/n);      }
  }                        }
...                      ...
y = g(x);                n = 3.0;             y = g(x);
                         y = g(x);            z = g(x,4.0);
                         n = 4.0;
                         z = g(x);

n

CompileTime

n

Version 5 uses a global too, but it is not a const. So its value is bound at RunTime and can be changed while the program executes. This can be good or bad. It is bad if it should not change, because then a typo in the code (e.g. n = 33; when you meant m = 33;) will cause confusion and incorrect output.

Version 6 does not use a global value for n, so it looks more like $g(x,n)$, except that it uses Ox's default-value feature. The user can call g(x) or g(x,n). If $n$ is not provided then the default value of 3.0 will be used. Unlike Version 5, in which the global parameter is changed and is then in effect for every subsequent call to g(x), the value of n passed in Version 6 just applies for that function evaluation.

A final version of a way to code a parametric function that uses objects is discussed later.

Exercises

The program below codes a mathematical operation you are familiar with. It uses a built-in Ox functions to check the result (please look up the function in the Ox function summary, it can come in handy). The code runs and produces verified output for the given values. However, there are no comments, no new lines and no indentation to help a human read the code. Note that it does use one feature that is good coding: it uses enum to give labels for elements of the input vector. The tags in the enum should make sense, or the tags might help you recognize the operation being coded.
1. Save a copy of the file to a new one called 17-addstyle.ox. Using that file modify it to make it readable following the guidelines given in the text, or another style if you have been taught a different way and prefer that. Changes to include: comments, line breaks, and indentations. Once you start doing that you should be able to figure out what the program is doing. Then give the things in the program (functions and variables) more meaningful names that reflect their purpose.
2. Once you have done that, make sure the program still works (you haven't added an error).
3. Extra Credit: look in the Ox function summary to find a built-in function related to the one already used by the program that would solve the problem as well. See if you can get it to do the same work as the function in the file. However, note, the documentation is a bit unclear and that function will return the reciprocal of the values we are looking for. If you get that far, try picking new inputs that lead to undefined results for the function in the file but that the built-in function handles just fine.
```
17-addstyle.ox
 1:    #include "oxstd.h" enum{c,b,a} q(co){decl rad=sqrt(sqr(co[b])-4*co[a]*co[c]); return (-co[b]+(-rad~rad))/(2*co[a]);}main(){decl myc=<2,3,1>,myr; myr=q(myc);println("coeffs:",myc,"solutions:",myr,"check:",polyeval(myc,myr));}
```
Open the code below and run it. Verify a compile-time error occurs. Now move the declaration of the global y above the first reference to it and see if the error goes away. Explain why.

18-global.ox
 1:    #include 
 2:    
 3:    decl v;
 4:    
 5:    f() {
 6:    	v = 5;
 7:    	y = 6;
 8:    	}
 9:    
10:    decl y;
11:    g() {
12:    	y = 7;
13:    	}
14:    
15:    main() { }

19-ping-pong.ox

Run the program as is. Then stop it by clicking on the STOP but icon on the OxEdit menu (or it may die on its own).
Add to the code a global variable called rally to count how many hits by both sides have occurred so far. Do this by adding statement inside each function that increment rally.
Now change the code so the rally ends after 10 volleys.
[Harder]. Add #include "oxprob.h" to the top of the code. Look up the ransubsample() function. Use ransubsample(1,11) so that the rally lasts anywhere from 0 to 10 hits.

The Style Council♭Paul Weller's group after the Jam:

Exhibit 25. Programming Style Guide

Exercises

The Style Council^♭Paul Weller's group after the Jam: