C<sup>4</sup>E: Computation For Economists

Numb3rs^♭TV Show: Integers & Reals

Run this code and look at the output:

11-numlimits.ox
 1:    #include 
 2:    #include 
 3:    main() {
 4:    	println("Some predefined constants ",
 5:      "\n        +Infinity                    : ",M_INF_POS        ,
 6:      "\n        -Infinity                    : ",M_INF_NEG        ,
 7:      "\n        Decimal digits of precision  : ",DBL_DIG          ,
 8:      "\n        Machine precision            : ",DBL_EPSILON      ,
 9:      "\n        Number of bits in mantissa   : ",DBL_MANT_DIG     ,
10:      "\n        Maximum double value         : ",DBL_MAX          ,
11:      "\n        Minimum positive double value: ",DBL_MIN          ,
12:      "\n        Minimum exp(.) argument      : ",DBL_MIN_E_EXP    ,
13:      "\n        Maximum exp(.) argument      : ",DBL_MAX_E_EXP    ,
14:      "\n        Maximum integer value        : ",INT_MAX          ,
15:      "\n        Minimum integer value        : ","%-15i",INT_MIN          );
16:      println("\nWhat happens when 1 is added to ","%-15i",INT_MAX,"?");
17:      println(  "Answer:                         ","%15i",INT_MAX+1);
18:    
19:      println("\nAnd if 11 is subtracted from ","%15i",INT_MIN,"?");
20:      println(  "Answer:                      ","%15i",INT_MIN-11);
21:    
22:      println("\nCompute a big exponent:\n   exp(DBL_MAX_E_EXP-1.0) = ",exp(DBL_MAX_E_EXP-1.0),
23:               "\nEven Bigger:\n    exp(DBL_MAX_E_EXP+1.0)= ",exp(DBL_MAX_E_EXP+1.0));
24:      println("\nCut the minimum positive value in half:\n ",DBL_MIN,"/2 =",
25:    		   "%25.22f",DBL_MIN / 2.0 );
26:    }

Two Bits, Four Bits, Eight Bits, A Dollar

word

Four bits has 2$^4$ or 16 different values: 0000 0001 0010 … 1111. So at most we can code 16 different integers in a four bit word. By analogy, two decimal digits can store 10$^2$=100 different values, 0, 1, … 99. (We apparently adopted the decimal system because we have ten fingers. But we can represent 11 different values with our hands if no fingers equals 0. So perhaps if computer geeks had been around in the Stone Age we would count using base 11!) We think of the number 35 as 3$\times 10^1$+5$\times 10^0$. With four binary digits a computer engineer could wire arithmetic circuitry to interpret the bits as powers of two. For example, a 4-bit word equal to 0101 would be interpreted (by hardware) as $$0101_2 = 0\times 2^3 +1\times 2^2 +0\times 2^1 + 1\times 2^0 = 5_{10}\nonumber$$ This is an example of a four bit unsigned binary integer.

What about negative numbers? Let's first think how humans do this and then compare it to the ways developed to do it digitally. Humans use a binary code, "+" and "-", to modify the interpretation of a number. We allow + to be implied or the default, which means there are two different symbols for one state, "+" and "\ ". Humans place the sign bit in the leftmost, highest valued, digit, as in $-55$ and $+55$.

Naturally, a computer can store the binary sign information in a single bit: positive and negative being coded as either 0 and 1. And it matters where in the circuitry this "sign bit" shows up since the hardware must treat it differently than a power of 2. For our purposes we can write the sign bit as the leftmost bit just as with signed decimal numbers. So that means that in the example above, 0101, the leftmost 0 would not be the 2$^3$ position. The 0 could mean either positive or negative. As it turns out, it will be treated as a plus sign. So now we reinterpret the four bits as $0101 = (+) 1\times 2^2 + 0 + 1\times 2^0 = +5_{10}$.

It is worth noting (and this will be repeated several times), the bit pattern has not changed only the interpretation of the bit pattern. The interpretation of the pattern is done by the circuitry and the CPU can have circuitry for both signed and unsigned integer arithmetic. In fact, it does, because both interpretations of a fixed number of bits will be important.

At this point, the symbolic and digital representation of integers part company. If we were to continue with the human approach computers would treat 1101 as -5$_{10}$. That is, the sign bit turns on and changes the absolute value stored in the three other bits negative. But, it turns out, this is not the best way to store negative numbers digitally. Instead, computers use 2s Complement Arithmetic. The negative of a number is found by switching 0s to 1s and vice versa (the complementary bit) then adding 1 to the result. The complement of a bit is to switch its value: 0 becomes 1 and 1 becomes 0. In 2s complement arithmetic, a number is negated by complementing each bit in the word and then adding 1 to it.

Representing 5 and -5 in binary 2s Complement.

+5    =     0101
-5    =     1010 + 0001    =   1011
-(-5) =     0100 + 0001    =   0101    =   +5

1+1

0

2s Complement integers in four-bits

    -8   -7   -6   -5   -4   -3    -2    -1    0   +1   +2   +3   +4   +5   +6    +7
-------------------------------------------------------------------------------------
  1000 1001 1010 1011 1100 1101  1110  1111 0000 0001 0010 0011 0100 0101 0110  0111

5 + (-5) does equal 0.

     0 1 0 1
+    1 0 1 1
------------
 [1] 0 0 0 0

bit bucket

If we added 1 to 0111 we wrap around to 1000 = -8. Keep adding one to make it way back to 0. This means that in four bits $7+1 = -8$? This is an example of numeric overflow. The number we are trying to represent, +8, does not fit in the fixed number of bits using 2s complement of arithmetic. So in the final value position, the extra bit overflows into the sign bit. This would be the same as if on paper we only had two places for decimal digits and we added +01 to +99. Without the ability to magically insert a new place for 10$^2$, the final 1 that we carry over spills in to the sign column turns + into -.

Float On^♭The Floaters 1977

Things are a bit different when we ask about real numbers, an element of $\Re$, the real line. Obviously we cannot represent real numbers exactly on a computer because we only have a finite number of states, even if we used all of memory to represent a single number we are still dealing with a finite system. The digits of an irrational number like $\pi$ go on forever. So a computer can easily represent a concept like 0 or -5 in an exact sense, but not the concept of an point on the real number line.

Let's first rule out the irrational numbers like $\pi$ to just consider how to approximate the set of rational numbers. Recall that a rational number $x$ takes the form $x = a/b$ where $a$ and $b$ are integers ($b\neq 0$). There are an infinite number of rationals in any interval, such as $[-1,+1]$. So in the case of integers we can use 2s complement to represent all integers within a finite distance of 0 (the distance is determined by the number of bits in the word). But this is not true of the rationals, so there is no hope of being able to represent all the rationals in any interval of the real line.

Before we get specific about how rationals are represented digitally, consider the fact that we would not want our digital rational numbers to be evenly spaced on the real line like the signed binary integers are. We would want them to be closer together where we expect to do most of the computing. There is no reason to suppose that any particular calculation is going to require close approximations to numbers around, say, 10 million or -623.231529. Instead, it makes sense that the digital approximation is best near 0, meaning that we want more digital rationals around 0 than other points on the real line.

For a rational number, such as 31.94, we can always rewrite it as: $31.94 = .3194 \times 10^2$. In this formulation the decimal place "floats" so that the first significant digit in the number, namely 3, is in the 10$^-1$ place. (Or you could write it as $3.194 \times 10^1$). Now notice that the rational number is coded using three integers: 3194, 10, and 2. These three elements of scientific notation are called the mantissa, base, and exponent, respectively. If we always use the same base, then two signed integers can be used to represent any (non-repeating) decimal. Of course, the number of bits available to store the integers limit how many different rationals can be stored this way.

Thus, on computers, a real number is approximated by a floating point real number (FPR). In FPR the base is implicit, and is always 2. So only two signed integers code a rational, the exponent and the mantissa. In base 10 the leading digit (the 3 in 31.94) can be anything between 1 and 9, but not 0. But in base 2 the leading bit is always 1 so it contains no information. Therefore the leading 1 does not need to be stored. Instead, the hardware can be wired to treat the FPR as if a 1 was there without actually needing a bit to store it.

31.94 as a FPR

        3194₁₀
      = 1100 0111 1010₂
      = 1.100 0111 1010 x 2^{0000 1011}   [decimal moves 11₁₀ places, which is 1011₂]
      = 0 100 0111 1010 x 2^{0000 1011}   [implicit first 1 handled by hardware]

31.94 as a FPR in a 64 bit word

Base10:
3194     = Base 2:
           0 0000 0000 0000 0000 … 0000 0000 0100 0111 1010 0 00 0000 1011
           |---------------------mantissa------------------------| |--exponent--|
         = Base 16:
           0000 0000 00023 D00B

Making Flippy Floppy^♭Talking Heads 1983

floating point operation (FLOP)

floating point operations per second

3194 × .036
=.3194 × 10⁴     ×      .36 × 10^-1
=.3194 × .36     ×  10^4+(-1)
= .114984 × 10³

In other words, FPR multiplication needs an integer multiplication and an integer addition.

The distribution of FPRs on the real line depends on the number of bits in the mantissa and exponent. Binary integers are distributed evenly, FPRs are not. So there can be many FPRs near 0 and some FPRs much larger than the largest binary integer. There exist smallest positive and negative floating points different from zero, but it is pretty easy to see by analogy that these numbers are not completely useful.

Suppose I am working in decimal and can only store 6 digits, but the decimal point can float. The smallest positive number I can store is .000001. And the biggest is 999999. If I add these together I get 999999.000001. However, that takes 12 digits to store exactly and I only have 6. Keeping the most significant digits the result is simply 999999. In other words, I added a positive number to something and got back the same number. Just because I can store .000001 does not mean I can do precise arithmetic with it. Less extreme cases of round-off error can also occur. In 6 digits, 888 + 0.0012 = 888.0012, which would be rounded to 888.001. In this case the result of adding a positive number to 888 is more than 888 but the result is not exact.

31,940,000 + 0.0003194 = 31940000.0003194 =.3194000000031940 × 10⁸

significant decimal digits

2⁵²= 4,503,599,627,370,496

machine precision

units

DBL_DIG = 15 = decimal digits of precision.

anytime

Overflow and Underflow

Overflow

0111

Underflow

In the early days of computing, these were major concerns because the result of these operations can be unpredictable. If not handled by the hardware or software, overflow can result in a value that looks perfectly reasonable. From then the calculations using the result are completely meaningless but there might be no way to discover this. The extreme case of overflow is trying to compute and store infinity.

Symbolically, $3/0$ is not a real number, although we can associate it with +$\infty$ in the "extended reals". We can also say $ln(0)= -\infty$. Note that $0\times 10^1 = 0 \times 10^2 = 10^{-1} \dots$. This means there are many equivalent bit patterns in a 64 bit FPR. If hardware is built to ensure that true 0s are only stored one way, the other bit patterns that equal 0 can be used by hardware for special purposes.

Numerically, two special bit patterns are designated numerical infinity: $+\infty_N$ and $-\infty_N$. Hardware treats them properly, in the following sense:

Operations with $+\infty_N$ result in $+\infty_N$: $$+\infty_N + x\quad=\quad +\infty_N - x \quad=\quad +\infty_N * x \quad=\quad +\infty_N \div x \quad=\quad +\infty_N.\nonumber$$

Operations with $-\infty_N$ result in $-\infty_N$: $$-\infty_N + x\quad=\quad -\infty_N - x \quad=\quad -\infty_N * x \quad=\quad -\infty_N \div x \quad=\quad -\infty_N.\nonumber$$

Dividing by infinity or 0 give the proper results: $$x \div 0 \quad=\quad +\infty_N\nonumber$$ $$x \div -\infty_N \quad=\quad x \div +\infty_N \quad=\quad 0\nonumber$$

What about $0/0$? Even in the extended reals this expression is undefined. It is misleading to assign it any value. Numerically, it is assigned another special bit pattern called NaN. NaN is treated differently than numerical infinity. 1/$\infty$=0, but any operation involving NaN results in NaN.

The puzzle solved.

08-If6was9.ox
 1:    #include "oxstd.h"
 2:    main() {
 3:       decl x;
 4:       x = 0.25;
 5:       print( "Is 2.5=10*(0.25)? ");
 6:    //           0 1 2 3 4 5 6 7 8 9   check that there are 10 x's 
 7:       if (2.5== x+x+x+x+x+x+x+x+x+x)
 8:            println("Yes");
 9:       else
10:            println("No");
11:       x = 0.1;
12:       print( "Is 1.0=10*(0.10)?  ");
13:       if (1.0== x+x+x+x+x+x+x+x+x+x )
14:            println("Yes");
15:       else
16:            println("No");
17:       }

 1 / 3  = 0.3

1 / 10 = 0.1
1 /  4 = 0.25

However, it turns out that whether a fraction (a rational number) is repeating or not depends on the base the number is represented in. It turns out that $1/10$ is a repeating fraction in base 2. The number 1.0 = 10*0.1 is of course stored exactly in base 2. But with 53 bits in the mantissa there will always be round-off error when operations involve 1/10.

NO operation on FPRs should be considered as exact unless it is guaranteed to involve only integer values.

Fuzzy Logic

y==10*x

==

10*0.1 ≠ 1.0

if ( fabs(y - 10*x) < 1E-15 )
    println("Yes")
else
    println("No");

fabs()

1E-15 = 10^-15

Routine                 Purpose
--------------------------------------------------------------------
isfeq(a,b)              Check if a is numerically equal to b
isdotfeq(a, b)          Check numerical equality element-by-element
fuzziness(eps)          Set the tolerance on numerical equality (default is 1E(-13)

feq()

fuzziness

^-13

if ( fabs(y - 10*x) < 1E-13 )               if (feq(y,10*x))
    println("Yes")                              println("Yes")
else                                        else
    println("No");                              println("No");

? … : …

if() else ;

println( feq(y,10*x) ? "Yes" : "No" );

Definition 1. Numerical Equality

Numerical equality given a precision $\epsilon>0$ is defined as the condition: $$x\ \mathop{=_\epsilon}\ z \quad \Leftrightarrow\quad |x-z| \lt \epsilon.\nonumber$$ We also say that $x$ equals $z$ with tolerance $\epsilon$. This is equivalent to Ox code of

fuzziness(epsilon);
⋮
isfeq(x,z);

Exercises

OOPS! Run the program below and explain the output. Come up with other expressions that result in .NaN or .Inf.

12-fpr-lims.ox
 1:    #include 
 2:    #include 
 3:    main(){
 4:     decl x;
 5:     println("Extreme Calculations in Ox\n");
 6:     println("1. Ways to produce numerical Infinity and 0:",
 7:        "\n  log(0)  = ",log(0),
 8:        "\n  1/0     = ",1/0,
 9:        "\n  1/log(0)= ",1/log(0));
10:     x = 0/0;
11:     println("2. Ways to produce NaNs: x = 0/0",
12:             "\n  x       =",x,
13:             "\n  x+5     =",x+5,
14:             "\n  1/log(x)=",1/log(x));
15:     println("Precision around 1.0:",
16:             "\n   1-DBL_MIN =","%25.23f",1-DBL_MIN,
17:             "\n   1-epsilon =","%25.23f",1-DBL_EPSILON);
18:     println("Checking for NaNs in your code:");
19:     if (isnan(x)) oxrunerror("Big problems with my project: Check value of x!");
20:    }

Run 11-numlims.ox and explain the output. Explain the relationship between these pairs of machine limits:

DBL_MAX and DBL_MAX_E_EXP
DBL_MIN and DBL_MIN_E_EXP
DBL_DIG and DBL_EPSILON

Numb3rs♭TV Show: Integers & Reals

Definition 1. Numerical Equality

Exercises

Numb3rs^♭TV Show: Integers & Reals