1. The Style CouncilPaul Weller's group after the Jam: 
  1. Background
  2. The previous chapters presented some simple Ox programs to illustrate how a computer works. And in later chapters we will show how to do math and economics in Ox. This chapter is about programming, using Ox for examples. It teaches the syntax and other aspects of the language, but it is not intended as a quick way to learn Ox if you have previously taken a formal programming class. This chapter asks the reader to first think of a program as whole, like the canvas for a painting. Eventually we fill in the small details to complete the picture, but first we think of about the general outline.

    One reason to take this leisurely path towards programs that do real work is that the Ox documentation follows the more direct approach. If you have programmed before, then you know what you need to get started: examples and a complete description of syntax, predefined constants, functions and classes. You will start modifying the examples and programming right away. As you encounter compiler errors you will go back to the documentation for clarification. The HTML documentation does all that for you.

    For readers who are less sure of their preparation, I encourage you to first look at the big picture. The concern here is developing a sense of programming without much concern to write programs that are sensible.

  3. Oscar and Felix: prototype coders
  4. The programmer decides what functions to declare and what they will do. The programmer chooses names of variables and functions. The programmer decides if the program should be in one file are spread across files. Consider two prototypes.

    You might think that the part of the parable is to explain how Mr. E spends weeks getting his code to work, while Ms. P. steadily develops the code and gets it working properly in much less time. But actually, most people who do a lot of programming cycle back and forth between these two styles. When the issue seems straightforward it is natural to just tackle it directly ("bang out some code"). But at some point the head-on approach becomes unwieldy and you start to doubt whether the code is correct. Just as you might realize a sentence you have written tries to say too much so you rewrite into two simpler ones, so you might look at your code and re-organize so the logic is clear. At that point it helps to understand the way Ms. P. programs. This makes complex code readable and easier to correct and extend. It also allows Ms. P. to test components separately, to isolate problems and re-use code within changing contexts. However, the ability to switch to Ms. P. style does not happen automatically.

  5. The Eye of the Camel
  6. The Principle: use meaningful names for variables and functions that reflect their purpose, to make your code clearer to you and others.

    The names above, like FindEquilibrium, might look strange to you. In fact, this style of naming functions and variables has a name: CamelCase. Using it betrays my old-fashioned Pascal background. The developers of Ox prefer a different convention called Hungarian notation, which packs more information into the name. Several other conventions exist. As you program you will probably change how you name things, and as a function gets re-coded to fix errors or to generalize the code you should always ask if the name is still accurate. Most people who program a lot for themselves settle on one style or the other without being perfectly consistent.

    So, as you are writing simple programs in Ox for 354, think about the names of things. Does the variable's name describe the information in it? Again, most people cycle between quick-and-dirty names (i, x, g(y)) when starting code to renaming things (year, income, cost(output0) as the program's purpose and structure take shape. In this class you won't have time for this process until the final part of the assignment.

  7. I Call Your Name
  8. The Principle: Name constants in your code and protect them from accidental changes

    Programs rely on special constant values, such as the value of $\pi$, or the 2$\times$2 identity matrix, or the number of players in the game your program analyzes. Code is much easier to read and more reliable to use if each constant is given a name, such as Pi or I2 or N. The code is more reliable because the correct value only has to be set once when it is assigned to the identifier.

    If instead you leave the number of players as a number, say 3, then 3 may appear scattered in your code. To change the number of players to 4 each instance must be changed. If you miss one it introduces a difficult-to-find bug. And 3 might show up for some other reason, so you need to change only those 3s that refer to the number of players.

    In Ox, variables can be initialized when they are declared, as in decl Pi = 3.1417;. At RT0 the memory cell for Pi will indeed contain the rough approximation to $\pi$. But the danger of assigning a constant to a variable is that variables ... vary. You may accidentally change the value of Pi in some line of code, perhaps because you typed Pi = 5 but meant to type Hi = 5. The compiler does not catch typos like this, as long as Pi was declared above the line. But from then in RunTime the value has changed, and again this is a difficult error to locate.

    There are two ways to name constants in Ox and be guaranteed they stay constant from RT0 until the program terminates. One way is to add the attribute const to the declaration: const decl Pi = 3.1416;. Now the Ox compiler knows that Pi should never change, and if it encounters the statement Pi = 5 it will complain and not finish creating object code, let alone execute your program. You can then fix the typo and compile again.

    The other way to create a constant only works for integer values, but it can be very helpful in writing good code. It is the enum{} statement, which is similar to enum in C, but not exactly the same. All enum does is let the program enumerate a sequence of integer values and assign an alias for each:

    Enumerations
        enum{Zero,One,Two}
        enum{Asia,Africa,Europe,NorthAmerica,SouthAmerica,Australia,Antarctica,Ncontinents}
        enum{Three=3,NegThree=-3,NegTwo,Six=Three+Three}
    
    Without any =, an enumeration is just the numbers 0,1,2,.... The identifer is the new alias for that integer. Fhe first enumeration says that in this program Zero ≡ 0, One ≡ 1, and Two ≡ 2. The Ox compiler will not let your code change the value. (No memory cell is used to store Zero so it cannot be changed during execution.) It appears as its integer value in the object code. In Ox, an enumerated value is not a left-object because it is impossible to put it on the left side of the assignment operator =.

    The second enumeration lists the seven continents and assigns each an integer code, starting with 0 for Asia. So in this program the number 0 has two names: Zero and Asia. Antarctica gets a value of 6. A convenient way to define the number of items enumerated is to add a final enumeration, so Ncontinents ≡ 7. Now, if later the code is modified to define India as a continent of its own, the tag India can be put in the list and the number of continents will become 8 automatically (once the code is recompiled). This can be a very handy way to label the elements of a vector. For example, the specification of the X variables in a regression can be an enumeration:
    enum{Cons,Gender,Age,AgeSquare,Ncoeff}
    
    Now if beta is a vector of coefficients, it will be of length Ncoeff and beta[Age] will, for example, return the coefficient on age.

    The final enumeration above shows how to create any integer constant. The list converts to 3,-3,-2,6. NegTwo is assigned as a name for -2 because enum increments after each comma unless = appears.

  9. Space Oddity David Bowie (1969)
  10. The Principle: use indentation and lines in your code to make its structure clear to you and other humans.

    Consider two short programs that are identical in syntax and meaning:
    VERSION 1
    f(x){return x*x;}main(){ decl i,s;s=0;for(i=1;i<=10;++i)s=s+f(i);println("Sum of squares = ",s);}
    
    VERSION 2
    f(x) {
    return x*x;}
    main() {
    decl i, s;
    s=0;
    for(i=1;i<=10;++i) {
    s=s+f(i);}
    println("Sum of squares = ",s);}
    
    VERSION 3
    f(x) {
        return x*x;
        }
    main() {
        decl i, s;
        s=0;
        for(i=1;i<=10;++i) {
            s=s+f(i);
            }
        println("Sum of squares = ",s);
        }
    
    In some languages a new line in the code file means the end of the statement or command. In Ox a semi-colon does that job, so the programmer is free to break their code up into as few or as many lines as they wish. So Version 1 and Version 2 are the same program, but I think it is obvious that the second version is easier to follow than Version 1. Each new line is a new statement so a human reader does not need to find ";" to parse the text into separate statements.

    Further, a line in the file can have whitespace (spaces or tabs) at the start or not. So Versions 2 and 3 are also the same program. The difference now is that statements that belong to a function are indented relative to the name of the function. This makes it easier to see the function as a whole. In addition, the statement inside the for loop is indented relative to the start of the loop. Since the loop belongs to main() the inside of the loop is a third level of indent.

    Just as there are different naming styles, there are different indentation styles. Good good follows a style so that a human looking at the code can see the structure without finding items like ";" and "}". Those symbols are important parts of the language to avoid ambiguous code. But they are not the best way for humans to understand code. It is easier for them to use space, both vertical and horizontal, to display the structure.

    I only learned recently that my personal preferred style has a name: Ratliff Style. This means opening brackets are on the same line as the statement that starts a block. The lines of the block are indented and the closing bracket is indented but appears on its own line:
    f(x) {
        return x*x;
        }
    
    I can't explain why I like this over styles. You should find the style you like and use it.

    The other thing to decide is whether you use spaces or tab characters to indent. I use tabs, but be careful because a relationship with another nerd may be threatened by a difference of opinion on this: Tabs vs. Spaces.

    Exhibit 25. Programming Style Guide

    Example of using naming, indentation and comments to make your code readable to a human. When asked to submit code for major assignments you should emulate these ideas or follow another guide that you have been taught. Most of the code I provide to you, and the code that you submit for practice problems does not bother with some of these details. That's because the code is very simple and the context is clear from the course. But when working in a team or contributing to a project these ideas are important.
  11. I'm Bad, I'm NationwideZZ Top 1970s
  12. The Principle: use global identifiers sparingly
    A program creates an environment of things that will exist at some point in the memory of the computer while the program is running. Outside this world is the Run Time Environment, which sets things up for the program and lets it get information in and out. The RTE needs to know the entry point into the world: the first command to be executed when the program gains control of the CPU (compiled) or interpreter.

    In C and Ox, the program's entry point is a single function called main(). main might be the last thing in the .ox file, but it is always where execution starts. You may have a lot code that compiles, but without a main() it will not do anything. Once started the program may terminate at any point (for example an fatal error occurs), but it definitely terminates if/when the program counter (the PC) gets to the end of main().

    It might make sense that the entry point is the first thing in the source code. However, the job of parsing the source code is made simpler if identifiers are declared before they are used. So in most Ox programs many items will be declared and/or defined above main(). As discussed earlier, #include "oxstd.h" declares all the standard functions and constants in Ox above main() to ensure any of them can be used in the code.

    We call the world of the program the globe, which is not typical, because identifiers in it are said to be global. A global identifier, and the location it refers to, can be accessed by any line of code in the file after it is declared. (When your program is spread over other files there is a bit more to what it means to be global.) The two main types of global identifiers in Ox are variables and functions. Every variable must be listed in a decl statement. As discussed in the previous chapter, decl tells the Ox interpreter to create a new dynamically allocated variable of C type oxvalue. We can think of this allocation as happening before RT0 for global variables.

    A function can be declared by simply stating its name:
    f(); declares f to be a function. That does not say what it does, merely that it exists on the globe. The function is defined by placing the statements that it will execute inside curly brackets:
    f();                        // f declared
    
    g() {                       // g defined and declard
       println("I am g");
       }
    
    f() {                       // f defined
      println("I am f");
      }
    
    A function creates a sub-world between the left and right bracket of its definition. Variables declared inside the function are local to it. As in C a function cannot be defined within another function, only at top level as a part of the globe. However, we will see that some functions are actually declared below the global level. No statements outside { } can access a local variable. A statement inside a function can reference any global declared before the function is defined.

    This code will create a compiler error:
     1          decl v;
     2          f() {
     3              v = 5;
     4              y = 6;
     5              }
     6          decl y;
     7          g() {
     8              y = 7;
     9              }
    10          main() { }
    
    Line 3 is okay because v is used ("referenced") after it is declared on line 1. But the global y is declared after it is first used inside f(). Making the programmer declare before using identifiers means the parser does not have scan the file once to find all the declared identifiers and then scan again.

    The error looks something like:
    new06.ox (4): 'y' undeclared identifier
    new06.ox (4): 'y' left-value expected (need storage object)
    Ox reports errors: exit code= 1!!
    
    Line 8 inside g() is okay because y was declared before g() was defined. Consider this program:
    19-ping-pong.ox
     1:    #include "oxstd.h"        
     2:    ping();
     3:    pong() {
     4:    	println("pong");      
     5:    	ping();
     6:        }                     
     7:    ping()  {
     8:    	print("ping-");
     9:    	pong();
    10:    	}
    11:    main() {
    12:    	ping();
    13:    	}
    
    If the function declaration for ping() were not separate from its definition then line 2 would be deleted and the code would not compile, because ping() is defined after it is used in pong(). If we were to swap their order then pong() would used before it is defined. But since ping() can be declared before pong() and then defined afterwards the code is fine.
  13. Call MeAl Green 1974: functions and their arguments
  14. The Principle: Use functions to avoid duplicate code and to protect data.
    Formal and Actual Parameters
    In following a top-down approach to programming, we are now concerned with getting information in and out of functions. A bottom-up approach would focus on statements and operators and later consider grouping them into functions.

    Information passing can be confusing even if you have programmed before, because languages differ on how to pass information. So to explain this the functions in our examples will continue to do little more than assign values to variables and print them out.
    20-arguments.ox
     1:    #include "oxstd.h"
     2:    cobb(x,y,aU);
     3:    
     4:    main() {
     5:        decl two = 2.0,u=0.0, ok;
     6:        ok = cobb(two,2.5,&u);		//address of u
     7:        println("Output: ",ok," ",u);
     8:        }
     9:    
    10:    cobb(x,y,aU) {
    11:    	if (x < 0 || y < 0 )
    12:    		return 0;
    13:    	aU[0] = x^0.2 * y^0.8;
    14:    	return 1;
    15:        }
    16:    
    
    --------------- Ox at 15:30:31 on 18-Sep-2012 ---------------
    
    Ox Console version 6.21 (Windows/U) (C) J.A. Doornik, 1994-2011
    This version may be used for academic research and teaching only
    b 7
    Give me five: 5
    What is d? -5
    
    First, the word "arguments" refers to items listed in parentheses after a function name. There there are both formal arguments and actual arguments. Formal arguments appear in the functions declaration and definition. So on line 3 b is the second formal argument to five(). Actual arguments are the items listed in a call to the function within the code. five() is called on the second to last line of the program. The second value there is two, a local variable. So at this call two is the second actual argument.

    The formal arguments are placeholders. The actual arguments are what goes in the placeholder at a particular point in run time. So two is associated with the formal argument b at that call to the function.

  15. Fast, Flexible, Portable
  16. Rule of Thumb #13. Learn a fast language, such as C or Fortran.
    Use Google to find author
    This section argues that rule-of-thumb #13 is not very helpful. This chapter is a guide for using languages like Ox (or Matlab or Python) so that your code's speed compares to the code you could write in C or FORTRAN. A few hours thinking about the topics discussed here and using them to write your own code can save you weeks of waiting for results or weeks of re-coding your problem in another language because your code takes longer than you think.

    We have already discussed the first step required to understand this issue: C/FORTRAN are compiled languages. Ox/Matlab/R/Python are interpreted languages. Their interpreters are programs written in a compiled language, so they derive their efficiency in computing from a program written in another language by another person (or group of people). This can make interpreted language appear slower than compiled ones, but it depends on several factors.

    The previous chapter also discussed another aspect of speed. A proper accounting for the time taken by the programming cycle includes not just how long RunTime takes, but all the time from an idea to a finished result. In some cases this may be primarily RunTime. If the problem is very similar to things you have already programmed (or someone else has programmed it), so you can use a canned package. If it involves a lot of computation, then you only care about how long production runs take. An example in econometrics is a Monte Carlo or bootstrap computation on a simple data generating process.

    Otherwise, a full accounting for speed includes time spent coding (and debugging) as well as RunTIme. Rule of Thumb #13 suggests it is better to pay the upfront cost of learning a compiled language. Having paid that upfront cost you will not later on be stuck with high marginal cost of a seemingly slow derived language during your production run. But everyone would agree that there is a limit to that argument. Why stop with a language like C? Why not learn to program in assembly language to wring out even more inefficiency from your code? No one advocates that solution because the fixed costs swamp the reduction in marginal cost, but in many cases decreasing returns kick in to rule out using compiled languages.

    Another way to pay an overhead cost to avoid high marginal costs later on is to study this chapter and learn the lessons in it. For most problems these lessons allow you to have the best of both worlds: use of a convenient interpreted language that avoids pitfalls into glacial execution times.

  17. Flexible FlyerHusker Du 1985: Write Code that can be reused and relied upon
  18. As a young economist how will you use the computer to do your research? This is difficult to answer, but here are some extreme cases. First, you may end up using the same framework over and over again. In this case it makes sense to write code specific to that task and to optimize its execution speed. That is, you plan to stay in CodingTime for a short time. If you spend two weeks optimizing the code and use it over and over again you might save yourself months of time waiting for the code to finish.

    On the other hand, you might end up working on many different kinds of models requiring very different kinds of computation. In this case you will end up spending a long time in CodingTime. You will not find it worthwhile to squeeze out all the computational inefficiency in each project because it will only enter ProductionTime once. Two weeks optimizing the code may save you two hours of computing time.

    Most people fall in between these extremes. As a researcher you will probably adopt one or two approaches (paradigms, frameworks, etc) to answer economic questions. But within your chosen approaches you will be continually changing the details of your model, and therefore your code. In this case you would like to balance the concern for computational efficiency with programming flexibility. Ideally you program the shared aspects of your models once and efficiently, but it should be done in such a way that making changes does not require going back to the beginning. One important reason to avoid this is that every time you change your code there is a risk that new bugs are introduced. But if you can reuse the same basic code and modify only details you can limit the chances that changes cause errors.

    Perhaps the best way to follow this strategy is to use object oriented programming. Later in this chapter OOP will be introduced and emphasized. One very important by-product of good OOP code is that your code can be shared with other people with some chance that they can modify and expand it easily, at least easier than if they have to tweak all your code to do something different.

    Let $f(x)$ be a function of a single real number $x$. We can write this more formally as $f:\Re\to\Re$. To begin, we want to use the cube-root of $x$: $$f(x) \equiv x^{1\over 3}.\tag{F1}\label{F1}$$ A symbolic function can be represented on a computer as a sequence of instructions to be applied to an input value. The steps are machine instructions in a computed language, and pseudo-instructions in an interpreted one. The program that works with $f(x)$ has to be able to run these instructions from different places in the code and to send different values of $x$. Finally, the value of $f(x)$ computed by the instructions has be retrievable.

    In Ox and all other languages, you can compute $x^{1/3}$ inline: just x^(1/3). If we are translating a symbolic model into a computer program then using inline operators has a potential drawback. If we wish to change $f(x)$ from the cube root of $x$ to something else, but otherwise leave the rest of the model the same, we have to find every use of it in the code and replace it.

    Inline coding of a named function is a bad idea
    -----------------------------------------------------------------
        x = 5.0;
        ...
        y = x^(1/3) - z;
        ...
        z = 25 -3*y^(1/2);   // is this the square root of y, or a typo??
        ...
    
    All computer languages have a syntax for calling functions, sending input values and receiving output values. In Ox we can then simply code $f(x)$ as a an Ox function.

    Use functions to code functions.
       Version 1                  Version 2               Version 3
    ----------------------------------------------------------------------
    f(x) {                     cubert(x) {              f(x,av) {
       return x^(1/3);            return x^(1/3);        av[0] = x^(1/3);
        }                         }                      return TRUE;
                                                         }
    ...                        ...                      ...
    y = f(8.0);                y = cubert(8.0);         ok = f(8.0,&y);
    
    Each of these code segments are valid ways to represent $x^3$ in Ox. Recall from the puzzle that opened this book that it is possible that these values are not exact. In particular, 0.0011/3 will not be exactly 0.1 because in the program 0.001 and $x^3$ are both floating point reals.

    The first two versions are almost identical, and both allow the program that uses the function to assign the value of the function to a variable. So in both cases y would hold the value 2.0 in after the line is executed. The routine is passed $x$ stored as a FPR and returns another FPR as a numerical approximation to $f(x)$. The only difference between Version 1 and Version 2 is the name we give function: the generic f() or the specific cubert.

    Which kind of name to use depends on the role $f(x)$ plays in your model. If $x^{1/3}$ is really fundamental to your model, in the sense that it makes no sense to use another function, then use a name like cubert(). In essence, you are "binding" the specific name to the function because they go together. However, if $x^{1/3}$ is just a convenient form and you might want to solve your model for other forms of $f(x)$, then the generic name is better. That way, if down the road you switch to, say, $f(x)=\log x$, the reference in the rest of the code to f() is still clear, whereas the specific name cubert would now be misleading.

    One difference between the first two is that the name cubert() is more descriptive. It says what function it is (the cube root of its argument). But that is not necessarily better. What if we later decided to study $x^{1/4}$ with the same program? In the first case the name $f(x)$ is not misleading and we simply change it to return $x^{1/4}$. But cubert() becomes a misleading name, which may end up with errors if the change in the exponent is forgotten.

    Thus, which name is better depends on the context. If the main program is meant to deal with an arbitrary function then f(x) is better because it is generic. But if the function is going into a library to be used across many programs, and it was always be $x^3$, then the name cube() is better.

    Version 3 is different. Now the value of the function is returned to the main program through a second argument, av. As discussed earlier, output arguments in Ox must be addresses. So the function's code assigns $x^3$ to the variable pointed to by av. The return value is still set, in this case to TRUE or 1. This is closer to the way that built-in Ox procedures want functions to be defined. The reason is that the function can tell the program whether the function evaluation can be trusted. TRUE is returned because built-in Ox procedures typically expect the user's function to return 1 (TRUE) if everything is fine and 0 if the evaluation of the function failed.

    For example, consider a different root: $(x)^{1\over 4}$. This function is not defined (as a real function) if $x=-1$. We could write our code to simply kill the program based on a numerical exception. But it may be preferred to let the program continue to run but deal with the fact that the function value was undefined.

    Quad Root Code
    cubert(x,av) {
       if (x >= 0) {
          av[0] = x^(1/4);
          return TRUE;
          }
       av[0] = .NaN;
       return FALSE;
       }
    ⋮
    if (f(x,&y))
       //proceed, y holds x^(1/4)
    else
       // deal with negative x, y holds .NaN
    
    In the case of a negative value of x, cubert() assigns .NaN to av[0] but also sets the flag to FALSE so that the program that set a negative value might be able to address. This can be very important. For example, suppose the main program is searching for values of $x$ that satisfy some condition. Then it needs to know that $x=-3.0$ is not valid when dealing with quad roots, but if $f(x)$ were $x^{1/3}$ it would not be a problem. Returning both a valid argument flag and the value of the function if the argument is valid allows the program to continue executing even when invalid input occurs.

    Now consider the generalization: $$g_n(x) = x^{1/n}.\tag{Gn}\label{Gn}$$ Here there are now two quantities that may take on different values, $x$ and $n$. The notation is used to suggest that the value of $n$ is, in some sense, fixed first and then values of $x$ are used. This is different than writing $g(x,n)$, which suggests that both $x$ and $n$ are varying simultaneously. In the first notation we often would call $n$ a parameter of the function and $x$ the argument.

    Your Ox code can reflect this difference between $n$ and $x$.
    Roots
       Version 4             Version 5           Version 6
    --------------------------------------------------------
    const decl n=3.0;        decl n;             g(x,n=3.0) {
    g(x) {                   g(x) {                 return x^(1/n);
      return x^(1/n);          return x^(1/n);      }
      }                        }
    ...                      ...
    y = g(x);                n = 3.0;             y = g(x);
                             y = g(x);            z = g(x,4.0);
                             n = 4.0;
                             z = g(x);
    
    As with Versions 1-3, none of these ways to code $g(x)$ is wrong, but depending on the way the function is to be used, one method will be better (more reliable) than the others. In Version 4, the parameter n is declared a global constant. This binds the value 3.0 to n at CompileTime. The only way to set n to a different value is to change the code.

    Version 5 uses a global too, but it is not a const. So its value is bound at RunTime and can be changed while the program executes. This can be good or bad. It is bad if it should not change, because then a typo in the code (e.g. n = 33; when you meant m = 33;) will cause confusion and incorrect output.

    Version 6 does not use a global value for n, so it looks more like $g(x,n)$, except that it uses Ox's default-value feature. The user can call g(x) or g(x,n). If $n$ is not provided then the default value of 3.0 will be used. Unlike Version 5, in which the global parameter is changed and is then in effect for every subsequent call to g(x), the value of n passed in Version 6 just applies for that function evaluation.

    A final version of a way to code a parametric function that uses objects is discussed later.

Exercises