- Introduction When you ask your computer to execute an Ox program a lot stuff happens in order for it to work. By
- Static versus Dynamic Value Assignment These notes call quantities that Ox stores in a memory location
#include "oxstd.h" const decl I = <1 , 0; 0, 1>; decl x = 5.0, //static assignment of value y; main() { y = 1-I; x = x + 5; //dynamic assignment of value }
#include "oxstd.h" const decl I; //ERROR! Must get a value since it is const decl x = ln(5.0), //ERROR! Can't refer to Ox functions in global space y; main() { const decl z = 2; //ERROR! local variables can't be const I = I + 5; //ERROR! can't assign to a const }
Here is one subtle thing about global variables in Ox. The variable #include "oxstd.h" decl y = 0; // same as just y
This feature can be put to use in complex Ox programs, but occasionally it can create confusion.
- Static vs. Dynamic Memory Allocation. We have already seen a case in an interpreted language where the memory a program can use for storing data can change while the program is running. But since the interpreter is a program written in a compiled language then obviously they too can allocate memory dynamically.
- Two versions of the a C program called
seven
are given here:seven.c sevenb.c ---------------------------------------------------------------- #include <stdio.h> #include <stdio.h> main() { main() { int h, *d; int h, *d; h = 6; h = 6; d = malloc(1); d = &h; *d = h; h=7; h = 7; printf(h,*d); printf(h, d); } }
The first version creates the sequence of memory states in the Figure. The modified version creates the sequence in the Figure. - Consider this segment of C code:
main() { int a, b, c; a = 5; b = a; c = a*a+3; }
In the first assignment what matters is the location of - Here is a different segment of C code, which has been compiled into an executable called
seven.exe
:RT Statement --------------------------- 0. int h, *d; 1. h = 6; 2. d = malloc(1); 3. *d = h; 4. h = &d+1; 5. printf(h, &d);
- Execution time is shown by moving from left to right, starting at RT=0. At this point the memory for
seven
includes storage for the literal constants 6 and 1 that appear in the code. Sinceh
andd
are declared as int (integer) type variables there is a memory location for each. However,d
has a*
in front of it, which tells C it will not contain an integer but rather apointer
to an integer. - The next expression says to place the constant integer value 7 into the address labeled
h
. - The next line calls a function in C to allocate one cell of memory and put the address of that location in
d
. The compiler allowed this to happen because it could check thatd
is*int
. That is, whered
points to is dynamic, but what interpretation to give the thing it points to is static (in a C program). The operating system finds new memory forseven
to use at addressF2999
, so that is what is put in the contents of addressd
at RT=2. However, the content of the new memory is unknown at this point. - The next line can be translated into English as
take the contents of
Sinceh
and put them whered
points to.h
contains 6 (binary 0110) those contents are copied into addressF2999
. Notice thatseven
is now modifying the content of memory cells beyond the memory it had when it started. The operating system is in charge of guaranteeing no other program can change that cell and the cell itself was free when it was allocated. The compiler will only letseven.c
treat*d
as an integer. At execution no extra code has to run to ensure that the expression on the right hand side,h
, is of typeint
, because its type is also know at compile time. - Then next line translates to
go to where
Sinced
points, get the contents and add 1 to them, then store the result at locationh
.d
is a pointer,&d
converts to the contents ofF299
. - Finally
h
andd
are printed to the screen. What values should show up? int h, *d; h = 6; d = *h; h = &d + 1; printf(h,&d);
The figure below illustrates the timing again. The difference is that this code segment does not use dynamic memory allocation, so - Static vs. Dynamic Variable Size We hold off discussing different types of data until the next chapter, but consider at least a vector or list of integer values, such as
ethnicb.c ------------------------------------------------- int d[30]; *int d, size; scanf("%d",&size); d = malloc(size);
How C's - There's Power in the Union: dynamic typing in a static language
- Here are three true statements:
- C is a statically typed language.
- The Ox language interpreter is a compiled C program.
- Ox is dynamically typed language.
- integer, float (real), and character. It also has pointers and multi-dimensional arrays of these basic types.
- Further, it has a very subtle item called a
union
, which is the back door alluded to above. - Version 1
int Bdisc, IsDiscrete; float Bcont; ... if (IsDiscrete) { // process X as a count variable, using Bdisc ... } else { // process X as a continuous variable, using Bcont ... }
- Version 2
union Bound { int disc; float cont; int IsDiscrete; } Bound B; if (IsDiscrete) { ... B.disc .... } else { ... B.cont ... }
- A C union definition
union { float x; char a; B; } B.a = 'Q; B.x = B.x / 2.2;
would compile and execute but the results would not be reasonable
struct RV { int IsDiscrete; union { int disc; float cont; B ; }; } RV X, Y; if (X.IsDiscrete) { ... X.B.disc ... } else { ... X.B.cont ... }
- Return to the original puzzle
- Dynamic typing in Ox
#include "oxstd.h" main() { decl x; x = 5; println(x); x = 5.0; println(x); x = "five"; println(x); x = x * x; // generates an error }
At each assignment to - Everything in Ox is a C structure of type
oxvalue
Full Declaration Trimmed Declaration ------------------------------------------------------------------ struct oxvalue struct oxvalue { int type; { int type; union union { { double dval; double dval; int ival; int ival; struct oxstring sval; ⋮ struct oxmatrix mval; } t; struct oxrange rval; } struct oxarray aval; struct oxfunction fval; struct oxclass clval; struct oxfile ioval; struct oxreturn retval; struct oximport impval; } t; }
The trimmed version simply gets rid of some lines and it should look familiar. It is essentially the same as - dynamic memory allocation
- pointers such as
*p
- the
union
type - the
struct
type
we are in a position to explain what
stuff happensI mean your computer executes many instructions before, during and after the instructions to carry out your program. We have talked about some of these tasks already, including the parser, compiler, linker, and loader. Unless you have studied computers beyond these notes you have at best a fuzzy idea of what all that means. But the important thing is that none of those tasks directly do what your program does. We will use
RT
to denote run time: how long the program has been executing in terms of total time it is in charge of the CPU, either directly or indirectly through the interpreter having charge of the CPU. And RT0
is the moment at which RT=0
. The parser, compiler, linker, and loader tasks all occur before RT0
. At RT0
the intent of the program starts to be implemented.
Besides the syntax, the built-in functions of a programming language there are ways in which these tasks work that determine how a program executes and what it can or cannot do to the computer. A static feature of a language occurs (or is determined) before RT0
. Static features do not respond to, and are not determined by, the execution of the program. They do not depend on inputs that could be different from one run of the program to the other. A feature of a language is a dynamic feature of a language if it occurs or is determined after RT0
. For example, the length of a vector exists in the mind of the programmer even as the code is being written. But this does not mean it is a static feature. If the language has dynamic vector sizes then the length remains an unspecified integer until some time after RT0
.
The same feature may be static in one computer language and dynamic in another. As the names suggest, if a feature is static in one language and dynamic in another, then the static language is less flexible. But flexibility comes at a cost: a static feature is something that can be checked and made secure before run time. If the feature is fixed during compile time then the compiler can be sure that it is handled appropriately at all points in the program. But if the feature is dynamic (such as how long a vector is), the compiler must include extra code in the object file to ensure the feature is handled properly.
A dynamic feature can then be slower to execute because the checking has to take place as the program runs. Or it might be less safe since a dynamic change without checks might do something that shouldn't happen, such as erase a memory cell allocated to another running program. Operating systems are designed to stop things like that from happening, and they monitor what is happening while a program is executing. For example, on any OS you might use to do economics, the run time system will not let your program corrupt the memory cells of another running program.
things. There is stuff that are not
thingsbut that look and act the same way. We have seen the example of
enum{Zero,One,Two}
which would create the label Zero
for the integer 0, One
for 1, etc. These are not things as defined here because as the program is running there is no memory cell that stores 1
with the label One
. Instead, the Ox compiler has simply replaced uses of One
with the integer 1 and then compiled the code. Since this happens at compile time (and therefore before RT0) this is a static association between the identifier One
and the integer 1.
One the other hand, we have seen that decl cnt = 1;
is a dynamic
assignment of 1 to a thing with the label cnt
. For this reason the count can be changed after RT0: cnt=cnt+1;
happens during execution and it modifies the memory cell associated with cnt
.
There is a way to create a static assignment to a thing by using the const
qualifier to the decl
command:
A const
thing must be initialized but its value may not be changed thereafter. So notice the first line above declares a constant 2x2 identify matrix named I
. The programmer can be sure that I
will hold that matrix at all times in the program. If the program accidentally included a statement that modified I
the compiler would catch that and produce a compile-time error. So this is a case of static assignment of value to an identifier.
Notice that the second line of the code above initializes x
to 5.0. That is also a static assignment of value in the sense that it happens at or before RT0, the start of execution. However, because x
is not declared main()
that modifies the value of x
does not produce an error, but it is a dynamic assignment of value. The memory cell of x
changes to 10 during execution, hence it is a dynamic assignment of value. Finally, you can see that y
is not initialized to any value when it is declared. Because it is not const
this is allowed, but there is something subtle here.
Because I, x, and y
are declared outside of brackets { }
, they are called global. In Ox, global things exists during all the execution times. But this means there are limits to what values can be computed when initializing a global variable. Only expressions that the Ox compiler can handle are allowed. An example is that log(5)
can't appear outside of brackets because it means the Ox function log()
must be called (so it must happen after RT0).
Since globals are assigned values at RT0 it is not possible to initialize them with values that depend on Ox functions. Further, variables declared inside brackets only exist during the time the code inside the brackets is executing. This means (in Ox, not all languages), that a thing cannot be declared const
inside a function. So here are examples of errors produced related to const
versus non-constant declarations:
y
is a global and exists at RT0. It turns out that Ox gives it an initial value of integer 0, so the line above is equivalent to
At RT0
the executable seven.exe
is in memory and starting to run. The static variables h
and d
have space allocated for them. However, note that d
is declared with a *
before it. This means that it stores not an integer but a pointer to an integer. For our purposes we can imagine that d
is simply a memory cell, but in some cases more memory is store with a pointer than just the address.
At RT0
the variables are not initialized so their contents are unknown, but two constants referred to in the program are ready for use, 610= 01102
and 710 = 01112
. As in six.c
, the contents of h
are replaced by 6 as the first user instruction after RT0
.
The next statement is very C-specific, although something like it is available in nearly any language. It calls a predefined function which asks the OS for some new dynamic memory, in this case 1 memory cell. The function returns the address of the memory cell allocated to the program, in the Figure F29916
. Before it returns control back to seven.exe
the run time system has ensured that the new memory cell is not being used by another program. This address is stored at d
, and that location is known at RT0
because d
is statically allocated.
In seven.c
we can write h=6
but not 6=h
. That is, =
is not the symmetric mathematical operator =. It is an assignment operator, and which side of it you are on matters. (Some languages recognize this difference by using :=
for assignment to distinguish it from mathematical equality.) But why can't we write 6=h
in C? Because the left side of the assignment has to be a location in memory at runtime (although some other uses of =
occur before RT0
, but not the ordinary statements above.) And 6
has no location.
It is true, and the figures emphasize it, that the constant 6 has to be stored somewhere in the object file for this assignment to happen. So you might think that 6
on the left side of =
could refer to that location. But then the contents of that address are no longer guaranteed to be constant during run time. Suddenly, half way through execution, 6 may equal 9, a language feature that might appeal to Jimi Hendrix (ask your parents/grandparents).
a
. But in the second assignment what matters is the content of the location of a
. But then note that in the third statement the content of a
matters, but it enters into an expression. That expression, a*a+3 must be computed and stored somewhere and then assigned to c
.
So the assignment operator =
does indeed assign the contents of one location to the contents of another location. The difference is that the location of a*a+3
is implicit. It is temporarily used to store the result so that in the next round of program execution its contents can be copied to the permanent location c
. Once that is done, the location may be used again for temporary storage and the results of a*a+3
are destroyed.
On the other hand, the locations labeled a, b, c
are the same locations throughout execution. They are allowed to be placed on the left side of =
, but a*a+3
does not refer to a fixed location and the compiler will not allow this to show up on the left side. It is not a left-object, but it is a valid expression that can appear on the right side of an assignment. If it does, its value is stored in a temporary location that is controlled by the run time system and will be re-used repeatedly as scratch space.
The figure below shows how memory changes while this code is segmented.
sevenb.c
.
sevenb
stays within the memory allocated at RT=0. (Technically things aren't this sample. Students with more advanced training will know about the runtime stack and other dynamic aspects of execution. But for our purposes this explanation is sufficient.)
Instead of calling malloc()
the program says that d
should point to h
, so its address appears as the contents of d
at RT=2. This location is only known once the loader has moved the executable into memory, but that all occurs before RT0
, so &d
is static. Then h
is changed as in the first version and both h
and the value pointed to by d
are printed. Again, what shows up?
This shows that pointers do not require dynamic memory allocation. The slightly different program sevenb.c
works fully within static memory, with d
assigned to the address of h
using the C operator &
. The effect of the second instruction is to place in d
the location of the statically allocated h
. Also, note that the second argument to printf()
is no longer d
but *d
.
3, -1, 6, 8
. One way to store this in C would be four variables: int d0, d1, d2, d3
. This is fine if the program always deals with four integers. For example if these integers are codes for the ethnic origin of a person's grandparents then there will always be four values for any person. In some cases the ethnicity of a grandparent is unknown or missing, but this could be coded with a special value like -1
.
However, now suppose that the list of integers is the ethnicity of children in classroom. Now the number of kids may not be known until runtime. It may be that the maximum class size is, say, 30. So a naive programmer may write int d0, ..., d29
(but they could not use ..., they would must list every variable). Of course, a better solution is to store one variable d
and have it contain the list of values.
scanf("%d","&size")
works is not important, except to say that it gets input from the user's keyboard and translates the results into a binary integer. It then stores the result at the location sent to it, which is why &
appears before size
.
In the first version, the length of the vector d
is in the source code, so it is static. In the second version, the size of the class is read as input after RT0
and stored in size
. Then, size
is used to allocate enough memory dynamically to store the needed vector. The different memory states are illustrated in the Figure above.
How would the program process a single student's ethnicity? In the case of the grandparent code, the source would just use d0
for the first grandparent. In the classroom example, this will not work. d0
is in the form of an identifier, so the lexical analyzer will parse it as a variable. However, in ethnic.c
there is no identifer with the name d0
. So it would be a disaster to use d0
as a way to point to the first element of the vector d
. In math notation we might use d0 to denote the first element of d, but plain text source code cannot handle subscripts like this (and both Ox and C were developed to take plain text as the source code). Instead, the syntax definition of both languages (and many others) says that d[0]
refers to the first element of the vector d
, d[1]
to the second, etc.
It is worth thinking about what d[0]
means. Recall that &d
is the address of the (statically allocated) variable d
. And if p
is a pointer, then *p
is the inverse operation: go to the contents of the location of p
and interpret those contents as an address. Because that is an address, *p
is a left-object as long as p
was declared as a pointer. We can think of d[
as similar to *p
. It means go to the location pointed to by the contents of d
. But that is just step one.
The next step is to evaluate the expression that appears before ]
and move down in memory that many elements of the vector. So d[0]
is the first element, d[2*3+4]
is the eleventh element because the expression evaluates to 10
, so ten steps are taken starting from the first.
In light of the discussion so far this should be confusing, because it was confusing to me. After using Ox for a while I began to wonder how this trick was played. I had an idea how to do it because languages like C and Pascal have a backdoor feature of their syntax that might allow this. But I was also curious to find out how it was done in Ox exactly. It was not hard to discover this, because it is not a hidden aspect of the language, even though Ox is not an open source product.
How it is done is of no interest to most users of a language like Ox. But understanding this does shed light on how things work. Understanding the trick of dynamic typing in a static language prepares us to the integrate outside resources in an Ox program.
Why is C statically typed? As in the case of most static/dynamic tradeoffs, the main reasons are code efficiency and code safety. A C compiler can ensure that during run time memory contents will contain values that have the interpretation the programmer intended, because at compile time it can and does enforce the defined type of each variable. It generates compile errors if the user tries to do something illegal with the declared type. But, memory contents at run time are not typed. They are simply patterns of 0s and 1s and could be interpreted in half-a-dozen different ways by the CPU regardless of what the programmer thinks the contents should be. Thus, static typing of variables is a language feature, not a hardware feature.
C has several basic variable types, including:
To illustrate its use, consider a model in which you have a random variable $X$ that takes on positive values and has an upper bound on its support, denoted $B$. That is, the density of $X$ is 0 for $x $<$ 0$ and $x $>$ B$. However, your model has two special cases: $X$ is a continuous random variable that takes on all values in the interval [0,B]. And $X$ is a count variable that takes on only integer values [0,1,...,B-1,B]. In the continuous case, $B$ could be an integer value, as in $B=6.0$, but this is different than the count case of $B=6$.
The complication is that you want your code to be used for both cases. One way to think about doing this is:
The statement if (A) { } else { }
means to check the condition A
during execution, and if A
is true, then execute the statements between the first two brackets. Otherwise (else), execute the statements in the brackets following else
. The integer variable IsDiscrete
can store a True/False logical condition, because C will interpret a value of 0 as False and any other value of IsDiscrete
as True.
There is a very minor inefficiency here. In any one model either Bdisc
or Bcont
is not used. That is one memory cell in RAM, which is not an issue. But the inefficiency could be an issue if the information to be stored might be large arrays of either integer or real (float
) values. More importantly, this code is hard to maintain. It has to have two different names for a concept that means only one will apply at a time.
union
allows the programmer to eliminate this inefficiency:
The variable B
is a variable of type Bound
. This is an example of a C user-defined type, and is a combination of built-in types. The union type Bound
contains two variables, one int
and one float
. The idea is that the bound is a concept in the model (technically, it is the upper bound of support of the random variable $X$). And that concept has a property, the numerical value of the bound. In addition, that property is either integer or real, and the difference is important for the rest of the model. So rather than two separate variables, Bdisc
and Bcont
, there is one variable B
with one of two possible values, disc
or cont
.
The source code accesses the elements of the union using the .
operator. B.disc
tells the compiler to go to the memory cell referenced by B
and then access its component named disc
. The key is that the C compiler will use the same memory location for both disc
and cont
. There are not two separate addresses, one containing the integer value and one containing the float or real value. And it will interpret the memory cell as the corresponding type depending on which variable name is used in the code.
Note that, there may be multiple variables of type Bound
. If Bound B;
is changed to Bound B1, B2
then B1.disc
would refer to one location and B2.disc
another. Now at a given point in the running program we could be using a continuous random variable as B1
and a discrete one in B2
, or two continuous random variables with different bounds, etc.
Just because a program compiles and executes does not mean the results are correct or meaningful. For example, this C program segment
The programmer is saying that B
should store the real (floating value) interpretation of the character "Q". Because a
is declared a char
variable the C compiler is happy to store "Q" in it. Whatever binary code that means "Q", the next line says to divide it by 2.2. The C compiler will allows this happen because x
is declared float
.
This is a mistake, but not a compile-time error. It is a conceptual error with the program that the C compiler cannot keep the programmer from making. It is up to the programmer to ensure that their program is written so that the contents of a union
are properly interpreted. Again, the memory contents are not intrinsically typed. So the CPU is quite happy to treat the contents of B.x
as a real number even though they correspond to a coding for the character Q.
In basic C programming there is rarely a need for a union
variable, and many programmers would never need to use it. We have introduced union
not as an advanced and esoteric feature but as something fundamental because it is the backdoor C provides through which Ox can be a dynamically typed language. But the code above shows that a union
on its own is prone to create errors that are hard to catch. In the first example we kept track of whether the random variable was discrete or not with another variable named IsDiscrete
. But in a program with several such variables it would be a nuisance to have to keep track of pairs of separate variables.
Instead, it is better (more convenient, less likely to create errors) to package the two aspects of the random variable together into one C variable. This could be an array of length 2, one element for IsDiscrete
and the other for disc/cont
. but the array would either have to be of base type int
or float
, and the second element could not be a union
.
What we need is a heterogeneous array, which in C is called a struct
, short for structure:
A memory map for this is given in the below. The variables X
and Y
are two variables declared as type RV
, a user-defined type that contains two memory cells. One cell is for IsDiscrete
, which the programmer can use to keep track of which kind of random variable is being processed. The second is B
, the union. Elements of a structure are also accessed with the .
operator, as in X.IsDiscrete
. But note that each element of a structure refers to a unique location in memory, but in a union the same location. So X.B
refers to the location for the bound. To use the bound the programmer refers either to X.B.disc
to give the location an integer interpretation, or X.B.cont
a real-valued interpretation.
Note that there are two locations labeled IsDiscrete
, but this does not confuse the compiler because the programmer cannot directly refer to either one. The code must specify which variables IsDiscrete
fields is being referenced.
What we now have is a C program for which a random variable's type, either continuous or discrete, is determined dynamically. And in one case something about the random variable is interpreted as integer, in the other case continuous. That something is the upper bound, and it is stored in the same location regardless of the type.
Code as above allows the programmer to keep track of which interpretation to give B
as the program executes. This is done by tracking for every variable declared as type RV
and indicator for whether it is discrete or not. And, as long as the programmer is careful there is nothing that prevents X
above from being discrete for part of run time and continuous for another part.
X.IsDiscrete
and have code such as If (X.IsDiscrete) { ... } else { ... }
. In the figure, after some execution time we see that X
is currently holding a continuous random variable with upper bound 7.2 and Y
is discrete with upper bound 6.
Again, the memory cell X.B.cont
is not inherently in a real state. Its contents could be interpreted as an integer but the result would be junk. The same is true for Y
. The corresponding value of IsDiscrete
allows the program to ensure there is no confusion. And note that X.IsDiscrete
will only be interpreted as an integer because that is its static type.
Puzzle
How does a language like Ox, which is interpreted by a C program, have dynamic types but the underlying C program has static types?
First, in the example above the variable X
has a static type. It is always of type RV
. And X.B
is always of type union { int disc; float cont; }
. What is dynamic is the interpretation of the union, not the type.
Answer 1:
Everything in Ox is stored in the underlying C code as the same (static)struct
, which includes anint
code for the current type and aunion
for the current type of the variable.
That is, the little man behind the curtain is indeed an ordinary C program, but one that exploits the union
data type to emulate dynamic typing, a form of Ox wizardry that it is a bit like Oz wizardry. This Ox code displays the wizardry of dynamic typing:
x
the type of data stored in it is changed. The Ox interpreter changes the variable type indicator for x
and then stores the value using the appropriate element of the C union
for storing the value. Now, consider the last line of the program. Currently x
is storing the word "five", and multiplication of character strings is not defined. This will generate a runtime error, caught by the Ox interpreter.
Anytime an operation is performed, the interpreter checks that the types of the operands are the correct or allowed types. If not, it does not just let the operation happen as in seven.c
above, which would result in junk. Instead, it ends execution and reports the error to the programmer. We do not have to guess about the way Ox, as a C program, handles data of different types The answer is provided for us by looking at the file Ox/dev/oxtypes.h
, which is part of the Ox development download. In that file, you find the following:
RV
but with different names. The union is called t
not B
, and the real variable is of type double
not float
. The rest of the full declaration of oxvalue
shows that many other things in an Ox program are also stored using this struct
. The difference is that these other types, such as a matrix or an Ox function, require their struct
type to keep track of their values.
You can easily find the declaration of the struct oxmatrix
to see what the C program that interprets Ox keeps track of to handle matrices.
When we combine these features of C:
decl x;
means in Ox. First, the part of oxl.exe
that parses and compiles your program finds instances of decl
. It then knows for each variable declared a storage space has to be created. Since this is happening during execution of oxl.exe
the memory has to be allocated dynamically. It cannot allocate, say a *int
or *float
for x
, because this is happening before RT0
of the Ox program.
Instead, it allocates a *oxvalue
, which will enable it to store anything that x
might represent during execution of the Ox program. These allocations happen at RT0
for declarations that are global. For declarations within functions they will occur each time the function begins to execute.
Answer 2
Every variable (every left-object) in an Ox program is a dynamically allocated structure of typeoxvalue
in the C program that runs Ox programs.
During execution oxl.exe
checks that the current interpretation of the union t
is consistent with the current reference to the variable. This answer is one reason that "Real economists write in FORTRAN". On each operation, such as x*x+3
, the Ox interpreter is executing extra machine instructions to check the current (dynamically determined) type of the operands. In a statically typed compiled language like C, these runtime checks are unnecessary since the checking was done at compile time.
How much this checking costs depends, and must be traded off with the advantages in terms of programming time. Further, C code for such things as matrix operations, which may appear in a library used by the programmer, must carry out its own checking on input. So some of the overhead in a language like Ox occurs in compiled languages as well.
Exercises
seven.c
? By sevenb.c
?