C<sup>4</sup>E: Computation For Economists

The Ties that Bind^♭Bruce Springsteen 1980: When Things Happen in a Program

Introduction

stuff happens

RT

run time

RT0

RT=0

RT0

Besides the syntax, the built-in functions of a programming language there are ways in which these tasks work that determine how a program executes and what it can or cannot do to the computer. A static feature of a language occurs (or is determined) before RT0. Static features do not respond to, and are not determined by, the execution of the program. They do not depend on inputs that could be different from one run of the program to the other. A feature of a language is a dynamic feature of a language if it occurs or is determined after RT0. For example, the length of a vector exists in the mind of the programmer even as the code is being written. But this does not mean it is a static feature. If the language has dynamic vector sizes then the length remains an unspecified integer until some time after RT0.

The same feature may be static in one computer language and dynamic in another. As the names suggest, if a feature is static in one language and dynamic in another, then the static language is less flexible. But flexibility comes at a cost: a static feature is something that can be checked and made secure before run time. If the feature is fixed during compile time then the compiler can be sure that it is handled appropriately at all points in the program. But if the feature is dynamic (such as how long a vector is), the compiler must include extra code in the object file to ensure the feature is handled properly.

A dynamic feature can then be slower to execute because the checking has to take place as the program runs. Or it might be less safe since a dynamic change without checks might do something that shouldn't happen, such as erase a memory cell allocated to another running program. Operating systems are designed to stop things like that from happening, and they monitor what is happening while a program is executing. For example, on any OS you might use to do economics, the run time system will not let your program corrupt the memory cells of another running program.

Static versus Dynamic Value Assignment

things

enum{Zero,One,Two}

Zero

One

1

One

static

One

One the other hand, we have seen that decl cnt = 1; is a dynamic assignment of 1 to a thing with the label cnt. For this reason the count can be changed after RT0: cnt=cnt+1; happens during execution and it modifies the memory cell associated with cnt.

There is a way to create a static assignment to a thing by using the const qualifier to the decl command:

#include "oxstd.h"
const decl I = <1 , 0; 0, 1>;
      decl x = 5.0,            //static assignment of value
           y;
main() {
    y = 1-I;
    x = x + 5;                 //dynamic assignment of value
    }

A const thing must be initialized but its value may not be changed thereafter. So notice the first line above declares a constant 2x2 identify matrix named I. The programmer can be sure that I will hold that matrix at all times in the program. If the program accidentally included a statement that modified I the compiler would catch that and produce a compile-time error. So this is a case of static assignment of value to an identifier.

Notice that the second line of the code above initializes x to 5.0. That is also a static assignment of value in the sense that it happens at or before RT0, the start of execution. However, because x is not declared x this static assignment is not permanent. So the executed line inside main() that modifies the value of x does not produce an error, but it is a dynamic assignment of value. The memory cell of x changes to 10 during execution, hence it is a dynamic assignment of value. Finally, you can see that y is not initialized to any value when it is declared. Because it is not const this is allowed, but there is something subtle here.

Because I, x, and y are declared outside of brackets { }, they are called global. In Ox, global things exists during all the execution times. But this means there are limits to what values can be computed when initializing a global variable. Only expressions that the Ox compiler can handle are allowed. An example is that log(5) can't appear outside of brackets because it means the Ox function log() must be called (so it must happen after RT0).

Since globals are assigned values at RT0 it is not possible to initialize them with values that depend on Ox functions. Further, variables declared inside brackets only exist during the time the code inside the brackets is executing. This means (in Ox, not all languages), that a thing cannot be declared const inside a function. So here are examples of errors produced related to const versus non-constant declarations:

#include "oxstd.h"
const decl I;               //ERROR! Must get a value since it is const
      decl x = ln(5.0),     //ERROR! Can't refer to Ox functions in global space
           y;
main() {
    const decl z = 2;      //ERROR! local variables can't be const
    I = I + 5;             //ERROR! can't assign to a const
    }

y

#include "oxstd.h"
    decl y = 0;    // same as just y

Static vs. Dynamic Memory Allocation.

Two versions of the a C program called seven are given here:

seven.c                              sevenb.c
----------------------------------------------------------------
#include <stdio.h>                #include <stdio.h>
main() {                          main() {
   int h, *d;                        int h, *d;
   h = 6;                            h = 6;
   d = malloc(1);                    d = &h;
   *d = h;                           h=7;
   h = 7;                            printf(h,*d);
   printf(h, d);
   }                                 }

The first version creates the sequence of memory states in the Figure. The modified version creates the sequence in the Figure.

At RT0 the executable seven.exe is in memory and starting to run. The static variables h and d have space allocated for them. However, note that d is declared with a * before it. This means that it stores not an integer but a pointer to an integer. For our purposes we can imagine that d is simply a memory cell, but in some cases more memory is store with a pointer than just the address.

At RT0 the variables are not initialized so their contents are unknown, but two constants referred to in the program are ready for use, 6₁₀= 0110₂ and 7₁₀ = 0111₂. As in six.c, the contents of h are replaced by 6 as the first user instruction after RT0.

The next statement is very C-specific, although something like it is available in nearly any language. It calls a predefined function which asks the OS for some new dynamic memory, in this case 1 memory cell. The function returns the address of the memory cell allocated to the program, in the Figure F299₁₆. Before it returns control back to seven.exe the run time system has ensured that the new memory cell is not being used by another program. This address is stored at d, and that location is known at RT0 because d is statically allocated.

In seven.c we can write h=6 but not 6=h. That is, = is not the symmetric mathematical operator =. It is an assignment operator, and which side of it you are on matters. (Some languages recognize this difference by using := for assignment to distinguish it from mathematical equality.) But why can't we write 6=h in C? Because the left side of the assignment has to be a location in memory at runtime (although some other uses of = occur before RT0, but not the ordinary statements above.) And 6 has no location.

It is true, and the figures emphasize it, that the constant 6 has to be stored somewhere in the object file for this assignment to happen. So you might think that 6 on the left side of = could refer to that location. But then the contents of that address are no longer guaranteed to be constant during run time. Suddenly, half way through execution, 6 may equal 9, a language feature that might appeal to Jimi Hendrix (ask your parents/grandparents).

Consider this segment of C code:

main() {
    int a, b, c;
    a = 5;
    b = a;
    c = a*a+3;
    }

a

content

a

a*a+3

then

c

So the assignment operator = does indeed assign the contents of one location to the contents of another location. The difference is that the location of a*a+3 is implicit. It is temporarily used to store the result so that in the next round of program execution its contents can be copied to the permanent location c. Once that is done, the location may be used again for temporary storage and the results of a*a+3 are destroyed.

On the other hand, the locations labeled a, b, c are the same locations throughout execution. They are allowed to be placed on the left side of =, but a*a+3 does not refer to a fixed location and the compiler will not allow this to show up on the left side. It is not a left-object, but it is a valid expression that can appear on the right side of an assignment. If it does, its value is stored in a temporary location that is controlled by the run time system and will be re-used repeatedly as scratch space.

Here is a different segment of C code, which has been compiled into an executable called seven.exe:

RT      Statement
---------------------------
0.      int h, *d;
1.      h = 6;
2.      d = malloc(1);
3.      *d = h;
4.      h = &d+1;
5.      printf(h, &d);

The figure below shows how memory changes while this code is segmented.

Execution time is shown by moving from left to right, starting at RT=0. At this point the memory for seven includes storage for the literal constants 6 and 1 that appear in the code. Since h and d are declared as int (integer) type variables there is a memory location for each. However, d has a * in front of it, which tells C it will not contain an integer but rather a pointer to an integer.
The next expression says to place the constant integer value 7 into the address labeled h.
The next line calls a function in C to allocate one cell of memory and put the address of that location in d. The compiler allowed this to happen because it could check that d is *int. That is, where d points to is dynamic, but what interpretation to give the thing it points to is static (in a C program). The operating system finds new memory for seven to use at address F2999, so that is what is put in the contents of address d at RT=2. However, the content of the new memory is unknown at this point.
The next line can be translated into English as take the contents of h and put them where d points to. Since h contains 6 (binary 0110) those contents are copied into address F2999. Notice that seven is now modifying the content of memory cells beyond the memory it had when it started. The operating system is in charge of guaranteeing no other program can change that cell and the cell itself was free when it was allocated. The compiler will only let seven.c treat *d as an integer. At execution no extra code has to run to ensure that the expression on the right hand side, h, is of type int, because its type is also know at compile time.
Then next line translates to go to where d points, get the contents and add 1 to them, then store the result at location h. Since d is a pointer, &d converts to the contents of F299.
Finally h and d are printed to the screen. What values should show up?

sevenb.c

int h, *d;
h = 6;
d = *h;
h = &d + 1;
printf(h,&d);

sevenb

Instead of calling malloc() the program says that d should point to h, so its address appears as the contents of d at RT=2. This location is only known once the loader has moved the executable into memory, but that all occurs before RT0, so &d is static. Then h is changed as in the first version and both h and the value pointed to by d are printed. Again, what shows up?

How to think about changes in memory as a program executes. Things like `h` and `d` refer to a location in memory. Since `d` is a pointer its memory cell contains an address pointing to another location. In the program it is assigned the location of `h`. That pointer can be followed to change the contents of `h` with directly referring to `h`.

This shows that pointers do not require dynamic memory allocation. The slightly different program sevenb.c works fully within static memory, with d assigned to the address of h using the C operator &. The effect of the second instruction is to place in d the location of the statically allocated h. Also, note that the second argument to printf() is no longer d but *d.

Static vs. Dynamic Variable Size

3, -1, 6, 8

int d0, d1, d2, d3

missing

-1

However, now suppose that the list of integers is the ethnicity of children in classroom. Now the number of kids may not be known until runtime. It may be that the maximum class size is, say, 30. So a naive programmer may write int d0, ..., d29 (but they could not use ..., they would must list every variable). Of course, a better solution is to store one variable d and have it contain the list of values.

ethnicb.c
-------------------------------------------------
int d[30];           *int d, size;
                          scanf("%d",&size);
                          d = malloc(size);

scanf("%d","&size")

&

size

In the first version, the length of the vector d is in the source code, so it is static. In the second version, the size of the class is read as input after RT0 and stored in size. Then, size is used to allocate enough memory dynamically to store the needed vector. The different memory states are illustrated in the Figure above.

How would the program process a single student's ethnicity? In the case of the grandparent code, the source would just use d0 for the first grandparent. In the classroom example, this will not work. d0 is in the form of an identifier, so the lexical analyzer will parse it as a variable. However, in ethnic.c there is no identifer with the name d0. So it would be a disaster to use d0 as a way to point to the first element of the vector d. In math notation we might use d₀ to denote the first element of d, but plain text source code cannot handle subscripts like this (and both Ox and C were developed to take plain text as the source code). Instead, the syntax definition of both languages (and many others) says that d[0] refers to the first element of the vector d, d[1] to the second, etc.

It is worth thinking about what d[0] means. Recall that &d is the address of the (statically allocated) variable d. And if p is a pointer, then *p is the inverse operation: go to the contents of the location of p and interpret those contents as an address. Because that is an address, *p is a left-object as long as p was declared as a pointer. We can think of d[ as similar to *p. It means go to the location pointed to by the contents of d. But that is just step one.

The next step is to evaluate the expression that appears before ] and move down in memory that many elements of the vector. So d[0] is the first element, d[2*3+4] is the eleventh element because the expression evaluates to 10, so ten steps are taken starting from the first.

There's Power in the Union: dynamic typing in a static language

Here are three true statements:

C is a statically typed language.
The Ox language interpreter is a compiled C program.
Ox is dynamically typed language.

In light of the discussion so far this should be confusing, because it was confusing to me. After using Ox for a while I began to wonder how this trick was played. I had an idea how to do it because languages like C and Pascal have a backdoor feature of their syntax that might allow this. But I was also curious to find out how it was done in Ox exactly. It was not hard to discover this, because it is not a hidden aspect of the language, even though Ox is not an open source product.

How it is done is of no interest to most users of a language like Ox. But understanding this does shed light on how things work. Understanding the trick of dynamic typing in a static language prepares us to the integrate outside resources in an Ox program.

Why is C statically typed? As in the case of most static/dynamic tradeoffs, the main reasons are code efficiency and code safety. A C compiler can ensure that during run time memory contents will contain values that have the interpretation the programmer intended, because at compile time it can and does enforce the defined type of each variable. It generates compile errors if the user tries to do something illegal with the declared type. But, memory contents at run time are not typed. They are simply patterns of 0s and 1s and could be interpreted in half-a-dozen different ways by the CPU regardless of what the programmer thinks the contents should be. Thus, static typing of variables is a language feature, not a hardware feature.

C has several basic variable types, including:

integer, float (real), and character. It also has pointers and multi-dimensional arrays of these basic types.

Further, it has a very subtle item called a union, which is the back door alluded to above.

To illustrate its use, consider a model in which you have a random variable $X$ that takes on positive values and has an upper bound on its support, denoted $B$. That is, the density of $X$ is 0 for $x $<$ 0$ and $x $>$ B$. However, your model has two special cases: $X$ is a continuous random variable that takes on all values in the interval [0,B]. And $X$ is a count variable that takes on only integer values [0,1,...,B-1,B]. In the continuous case, $B$ could be an integer value, as in $B=6.0$, but this is different than the count case of $B=6$.

The complication is that you want your code to be used for both cases. One way to think about doing this is:

Version 1

int Bdisc, IsDiscrete;
float Bcont;
...
if (IsDiscrete) {    // process X as a count variable, using Bdisc
    ...
    }
else   {            // process X as a continuous variable, using Bcont
    ...
    }

The statement if (A) { } else { } means to check the condition A during execution, and if A is true, then execute the statements between the first two brackets. Otherwise (else), execute the statements in the brackets following else. The integer variable IsDiscrete can store a True/False logical condition, because C will interpret a value of 0 as False and any other value of IsDiscrete as True.

There is a very minor inefficiency here. In any one model either Bdisc or Bcont is not used. That is one memory cell in RAM, which is not an issue. But the inefficiency could be an issue if the information to be stored might be large arrays of either integer or real (float) values. More importantly, this code is hard to maintain. It has to have two different names for a concept that means only one will apply at a time.

union allows the programmer to eliminate this inefficiency:

Version 2

union Bound {
     int disc;
     float cont;
     int IsDiscrete;
    }

Bound B;
if (IsDiscrete) {
    ...   B.disc   ....
    }
else   {
    ...   B.cont  ...
    }

The variable B is a variable of type Bound. This is an example of a C user-defined type, and is a combination of built-in types. The union type Bound contains two variables, one int and one float. The idea is that the bound is a concept in the model (technically, it is the upper bound of support of the random variable $X$). And that concept has a property, the numerical value of the bound. In addition, that property is either integer or real, and the difference is important for the rest of the model. So rather than two separate variables, Bdisc and Bcont, there is one variable B with one of two possible values, disc or cont.

The source code accesses the elements of the union using the . operator. B.disc tells the compiler to go to the memory cell referenced by B and then access its component named disc. The key is that the C compiler will use the same memory location for both disc and cont. There are not two separate addresses, one containing the integer value and one containing the float or real value. And it will interpret the memory cell as the corresponding type depending on which variable name is used in the code.

Note that, there may be multiple variables of type Bound. If Bound B; is changed to Bound B1, B2 then B1.disc would refer to one location and B2.disc another. Now at a given point in the running program we could be using a continuous random variable as B1 and a discrete one in B2, or two continuous random variables with different bounds, etc.

Just because a program compiles and executes does not mean the results are correct or meaningful. For example, this C program segment

A C union definition

union {
    float x;
    char a;
     B;
     }
B.a = 'Q;
B.x = B.x / 2.2;

The programmer is saying that B should store the real (floating value) interpretation of the character "Q". Because a is declared a char variable the C compiler is happy to store "Q" in it. Whatever binary code that means "Q", the next line says to divide it by 2.2. The C compiler will allows this happen because x is declared float.

This is a mistake, but not a compile-time error. It is a conceptual error with the program that the C compiler cannot keep the programmer from making. It is up to the programmer to ensure that their program is written so that the contents of a union are properly interpreted. Again, the memory contents are not intrinsically typed. So the CPU is quite happy to treat the contents of B.x as a real number even though they correspond to a coding for the character Q.

In basic C programming there is rarely a need for a union variable, and many programmers would never need to use it. We have introduced union not as an advanced and esoteric feature but as something fundamental because it is the backdoor C provides through which Ox can be a dynamically typed language. But the code above shows that a union on its own is prone to create errors that are hard to catch. In the first example we kept track of whether the random variable was discrete or not with another variable named IsDiscrete. But in a program with several such variables it would be a nuisance to have to keep track of pairs of separate variables.

Instead, it is better (more convenient, less likely to create errors) to package the two aspects of the random variable together into one C variable. This could be an array of length 2, one element for IsDiscrete and the other for disc/cont. but the array would either have to be of base type int or float, and the second element could not be a union.

What we need is a heterogeneous array, which in C is called a struct, short for structure:

struct RV {
   int IsDiscrete;
   union {
     int disc;
     float cont;
      B ;
     };
   }
RV X, Y;

if (X.IsDiscrete) {
   ... X.B.disc ...
   }
else {
   ... X.B.cont ...
   }

A memory map for this is given in the below. The variables X and Y are two variables declared as type RV, a user-defined type that contains two memory cells. One cell is for IsDiscrete, which the programmer can use to keep track of which kind of random variable is being processed. The second is B, the union. Elements of a structure are also accessed with the . operator, as in X.IsDiscrete. But note that each element of a structure refers to a unique location in memory, but in a union the same location. So X.B refers to the location for the bound. To use the bound the programmer refers either to X.B.disc to give the location an integer interpretation, or X.B.cont a real-valued interpretation.

Note that there are two locations labeled IsDiscrete, but this does not confuse the compiler because the programmer cannot directly refer to either one. The code must specify which variables IsDiscrete fields is being referenced.

What we now have is a C program for which a random variable's type, either continuous or discrete, is determined dynamically. And in one case something about the random variable is interpreted as integer, in the other case continuous. That something is the upper bound, and it is stored in the same location regardless of the type.

Code as above allows the programmer to keep track of which interpretation to give B as the program executes. This is done by tracking for every variable declared as type RV and indicator for whether it is discrete or not. And, as long as the programmer is careful there is nothing that prevents X above from being discrete for part of run time and continuous for another part.

X.IsDiscrete

If (X.IsDiscrete) { ... } else { ... }

X

Y

X.B.cont

Y

IsDiscrete

X.IsDiscrete

Return to the original puzzle

Puzzle
How does a language like Ox, which is interpreted by a C program, have dynamic types but the underlying C program has static types?

First, in the example above the variable X has a static type. It is always of type RV. And X.B is always of type union { int disc; float cont; }. What is dynamic is the interpretation of the union, not the type.

Answer 1:
Everything in Ox is stored in the underlying C code as the same (static) struct, which includes an int code for the current type and a union for the current type of the variable.

That is, the little man behind the curtain is indeed an ordinary C program, but one that exploits the union data type to emulate dynamic typing, a form of Ox wizardry that it is a bit like Oz wizardry. This Ox code displays the wizardry of dynamic typing:

Dynamic typing in Ox

#include "oxstd.h"
main() {
  decl x;
  x = 5;       println(x);
  x = 5.0;     println(x);
  x = "five";  println(x);
  x = x * x;   // generates an error
  }

x

union

x

Anytime an operation is performed, the interpreter checks that the types of the operands are the correct or allowed types. If not, it does not just let the operation happen as in seven.c above, which would result in junk. Instead, it ends execution and reports the error to the programmer. We do not have to guess about the way Ox, as a C program, handles data of different types The answer is provided for us by looking at the file Ox/dev/oxtypes.h, which is part of the Ox development download. In that file, you find the following:

Everything in Ox is a C structure of type oxvalue

    Full Declaration                           Trimmed Declaration
------------------------------------------------------------------
struct oxvalue                            struct oxvalue
{   int type;                             {   int type;
    union                                     union
    {                                         {
        double dval;                              double dval;
        int    ival;                              int    ival;
        struct oxstring    sval;             ⋮
        struct oxmatrix    mval;             } t;
        struct oxrange     rval;          }
        struct oxarray     aval;
        struct oxfunction  fval;
        struct oxclass     clval;
        struct oxfile      ioval;
	struct oxreturn	   retval;
	struct oximport	   impval;	
    } t;
}

RV

t

B

double

float

oxvalue

struct

You can easily find the declaration of the struct oxmatrix to see what the C program that interprets Ox keeps track of to handle matrices.

When we combine these features of C:

dynamic memory allocation

pointers such as *p

the union type

the struct type

decl x;

oxl.exe

decl

oxl.exe

*int

*float

x

RT0

Instead, it allocates a *oxvalue, which will enable it to store anything that x might represent during execution of the Ox program. These allocations happen at RT0 for declarations that are global. For declarations within functions they will occur each time the function begins to execute.

Answer 2
Every variable (every left-object) in an Ox program is a dynamically allocated structure of type oxvalue in the C program that runs Ox programs.

During execution oxl.exe checks that the current interpretation of the union t is consistent with the current reference to the variable. This answer is one reason that "Real economists write in FORTRAN". On each operation, such as x*x+3, the Ox interpreter is executing extra machine instructions to check the current (dynamically determined) type of the operands. In a statically typed compiled language like C, these runtime checks are unnecessary since the checking was done at compile time.

How much this checking costs depends, and must be traded off with the advantages in terms of programming time. Further, C code for such things as matrix operations, which may appear in a library used by the programmer, must carry out its own checking on input. So some of the overhead in a language like Ox occurs in compiled languages as well.

Exercises

What output will be produced by seven.c? By sevenb.c?

The Ties that Bind^♭Bruce Springsteen 1980: When Things Happen in a Program

Exhibit 16. Dynamic Memory Allocation

Exhibit 17. Static Memory Allocation with a Pointer

Exhibit 18. Two struct variables with a union element

Puzzle

Answer 1:

Answer 2

Exercises

The Ties that Bind♭Bruce Springsteen 1980: When Things Happen in a Program

Exhibit 16. Dynamic Memory Allocation

Exhibit 17. Static Memory Allocation with a Pointer

Exhibit 18. Two struct variables with a union element

Puzzle

Answer 1:

Answer 2

Exercises

The Ties that Bind^♭Bruce Springsteen 1980: When Things Happen in a Program