This figure gives you a sense of what happens when a user of Tim, the machine, asks Skyway, the operating system, to start a program.
The operating system (Skyway) would contain a sequence of instructions to load and execute a program. An executable program is a block of machine instructions sitting on the disk in a file. We will use the file extension .exe for a file on Tim's hard drive that contains executable instructions. A.outThe extension .exe is used in DOS/Windows for executable files. Unix-based operating systems do not have a standard extension to denote executables. In the old days the default file name for the file created by the compiler was a.out.)
The program our user happens to ask to run is called thrash.exe. The block of binary machine instructions for thrash must be copied from the hard drive into RAM. Skyway keeps track of parts of RAM that are not being used by itself or other running applications. It will load thrash in some free memory (and then mark that memory as used so next time it won't overwrite it.)
Because the OS decides where programs are copied into RAM to be executed, the instructions in thrash.exe could not know where it will be in memory when it executes. So an executable on disk cannot contain absolute addresses. Instead, addresses must be relative addresses.
Relative addressing
Address Contents Interpretation
--------------------------------------------------------------------------------------
D69A 0000 FE99 0A32 BD02 Top memory cell while running Base
...
D6A1 0887 D066 BD01 B36E Go To Address BASE+6E
...
D69C 1944 FFE7 BA77 0033 Random contents at program startup
0000 0000 0000 0000 After instruction at D6A1 has executed
Here we imagine thrash being loaded at address D69A, which will be the Base address. Then the OS puts the first instruction of trash to execute on the list of running applications. When its turn is up, the MPC will be loaded with that address and the program can start to execute. Somewhere in the program is an instruction that says to set the contents of a memory cell to 0. This instruction happens to be at D6A1, and the memory cell to set is at address 6E past Base (the last 8 bits of the cell contents is the relative address). You should check my hexadecimal math to confirm that BASE+6E =D69C The CPU on Tim adds 6E to Base, goes to that address (D69C), and sets all the bits in the memory cell to 0. The next time thrash runs it will be loaded at a different location in memory, but the relative location of the cell to clear will still be 6E down from the new base address.
On Tim, the addresses are 16 bits long, but the program contains relative addresses that are only 8 bits long (such 6E). This means that a program like thrash can only refer to memory cells within 28-1 = 255 addresses of the Base. For the programmer, it does not help that there are many more memory cells than that. Their program is limited in the amount of data it can process, without some mechanism to use more than one base address or to store longer relative addresses. This is why a program running on a computer with, say, 8 Gigabytes of RAM might run out of memory well before that, perhaps after 2GB of data are used. The machine language and the operating system only have 31 bits to store relative addresses. 230 = 1GB, and adding a 31st bit allows addressing over 2GB of memory cells. The operating system can reach all of the memory because its memory registers have, say, 36 bits, which allows for 64GB addresses.
One way the operating system can put the extra capacity in the length of the addresses to good use is to treat disk space as virtual memory. That is, locations on the disk can be logically treated as addresses beyond RAM. Disk storage is slower to access than RAM, but in return many large programs could be running at once by using virtual memory. However, this does not change the limit on the amount of memory a single program can access. When the CPU and the OS move from, say, 36 bit addressing to 64 bits, and this is accompanied by more relative address bits then it is possible for programs to process much larger amounts of data.
Think like a machine
Does programming require you to learn the binary instruction codes on a computer? Fortunately, no, because many ways have developed since the 1950s to let humans deal with instructions they can understand and then let something else translate that to machine instructions. For example, consider ways to set the value of one variable to the value of another.
How do I assign z? Let me enumerate the ways.
English Let y be z.
Ox or C y = z;
Stata gen y = z;
Unix C shell set y = z
Pascal y :=z;
BASIC LET y = z
For humans each of these statements are conceptually similar, but what happens at the machine level is very different. What would the Tim version be? Here I make up some up binary instructions on the 16-bit address, 64-bit word machine that might be the Tim-version of "Let y be z" at an address in memory and with human-readable explanation:
Let y be z in Tim-speak.
Address Contents Explanation
------------------------------------------------------------------------------------
069A 0000 FE99 0062 BD01 Copy contents of Base+62 (z) to Register R1
069B 0887 D066 BD01 006E Copy contents of R1 to Base+6E (y)
069C 1944 0000 0000 0033 Set MPC to the next statement to process (Base+33)
The example for Tim-speak is made up, but real computer instructions would seem even more arbitrary.
It appears that 0000 FE99 is the machine instruction for fetching the contents of a memory cell and placing them in a register. As humans we might give 0000FE99 a name such as LoadA, which is exactly the kind of thing a lower level
assembly language for Tim-speak would refer to. The cell to go to (0A32) comes next followed by the destination, say Register R1, with a code BD01. For this to happen the MPC must be pointing at 069A. The LoadA instruction would automatically increment the MPC by one, changing it from 069A to 069B. The command at the next address is 0887 D066, which completes the assignment by moving contents of a register, in this case R1 with code BD01, back to a memory address. We might give this instruction the mnemonic StoreA. What address does it store in? It appears to be Base+6E. Finally, 1944 0000 is the command for resetting the MPC to a new address (a JUMP instruction). The address to jump to is Base+0033.
Or Else! (hardware version)
Although computers have sped up, have more storage, and are interconnected by networks, the basic operations they perform are more or less unchanged. In fact, just a few basic instructions can support nearly everything you might think to compute. Here is a very small set of pseudo instructions, two of which are part of the machine version of "let y be z."
Pseudo code for basic CPU instructions on Tim
MNEMONIC EXTRA INFO EXPLANATION
-----------------------------------------------------------------------------------------------
STORE: Ri ADDRj Store content of register Ri at address in register ADDRj
LOAD: ADDRj Ri Load contents of address in ADDRj into register Ri
OPERATE: Carry out operations [+,-,*,/,exp(),ln()] on register values
IF(A) THEN{B} If A is true, increment MPC and continue.
Otherwise, set MPC to the address below block B
GOTO ADDR Put the address stored in a register in the MPC and continue
STOP: Return control to the Operating System
An example of an actual instruction set with corresponding assembly language mnemonics is for the famed MC609 chip. Beyond these instructions, our simple computer would also need methods to read and write data from long-term storage and to get input and output from other devices (keyboards, monitors, etc.) It may not be clear that such a short set of instructions can carry out complex operations. But the key is that what instructions are executed can be made conditional on the data (the IF(A) THEN{B} instruction). Another way to illustrate a sequence of instructions in Tim-speak is a flowchart.
Wet. Lather. Rinse. Repeat.
Two of the machine instructions above are vital to understanding how computers work. They are the If and the GOTO instructions. Together they allow a computer to repeat a set of instructions over and over until some condition occurs then move on to the next thing. Here is some pseudo code that shows how you might think about this:
Loop using conditional and unconditional jump instructions
ASSIGN: R1 0
TOP IF (R1 < 50) THEN {
block of instructions to repeat
OPERATE: R1 = R1+1
GOTO TOP
}
This is an example of pseudo code which is close to what an assembly language program might look like. It is a list of symbols that use the pseudo language for machine instructions along with some other features and conventions. The program mimics what Tim would do if the sequence of instructions were executed. A new line in the program corresponds to incrementing the MPC and executing the next instruction. This code is simply a program segment. Instructions would come before and after this to carry out the complete task.
From a human perspective, the pseudo code for a loop shown above is not so easy to follow, because it is not clear that the code will come back to the IF () THEN line 50 times. Our eyes have to see the TOP label and the GOTO TOP command, and then the initializing and incrementing of the loop counter. Using the flowchart symbol for an if structure we can illustrate the loop:
This allows our eyes to visualize the branching and returning in the program. Below the block B is a visual detour on the page which, if taken, always includes incrementing the loop counter and then returning to the test or condition. In the flowchart the program is actually complete after the 50 passes, so it reaches the STOP symbol.
In a long program there may be dozens of loops along with other tests of conditions. This means the programmer has to come up with labels for each loop and ensure that the GOTO statement at the bottom refers to the correct label. The flowchart illustrates a correct jump back to repeat the if instruction, but one loop may be inside another and the loop labels must be kept straight. This makes it very difficult to know what the code is doing and it is because the GOTO can go anywhere in the program, backwards or forwards. This is the cause of rightfully maligned spaghetti code.
Early on the first computer programmers recognized this problem and built ways to avoid spaghetti code. These developments lead to computer languages that are structured so that arbitrary jumps around the code cannot occur. How these structures work in Ox is described in the sections on Plans and Loops.
Summary
Programming is the process of creating a block of instructions to do a task.
A programming language is a tool designed to encode what a human wants the computer to do in a form the human can follow.
The language must then be translated into machine instructions that carry out the tasks. One task or statement in a program will typically required several (perhaps dozens) of simple machine instructions.
Language structures allow the programmer to carry out a complex task by safely using the computer's ability to jump to different machine instructions based on current contents of memory or registers.
Exercises
In the Ox Syntax Reference find the GOTO statement. Below is code to print the first 50 non-negative integers using if() and goto. Move the label TOP so that this becomes an infinite loop. And then replace the loop with a correct for() loop.