Characters and Strings
The number
3194
can be represented in computer memory as a signed or as an unsigned binary integer. In either case it would be stored in 64 bits as
319410
= 000000000000000000001100011110102
= 00000000C7A16 //binary integer
It can also be represented as a floating point real, which would include an exponent to shift 3 digits to the right. Clearly you cannot store "T" either as an integer or a real. So as a I typed the previous sentence,
3194
must be represented in yet another way when typing input.
As with numbers,
characters are stored using bit patterns. They are given interpretation by code and hardware (keyboards!). More than one code is used for characters. Most in use now are related to
ASCII code developed in the 1950s. The ASCII code uses one byte (technically 7 of 8 bits). In ASCII,
3194
is four characters (64 bits) that happen to be
3194ASCCII = 00110011 00110001 0011 1001 0011 0100
.
ASCII was designed to represent European languages. Other character sets and special symbols must be stored differently using more bits. Thus there are extensions to ASCII such as
Unicode, a 16 bit code for symbols. In Unicode, if the leftmost 8 bits are zero the whole thing is interpreted as ASCII. As these leftmost bits change from all zero other character sets are coded, such as Arabic, Cyrillic, etc.
One or more characters together (such as
3194
) is usually called a
string. Although Ox is primarily a mathematical language, it handles strings in many ways that can useful within programs. We have seen
string constants. Strings can be
concatenated using
+
question = "What are you?";
answer = "I am a string";
QandA = question+answer;
Strings can be searched for text using
find()
and related functions. You can also replace characters in a string with
replace()
.
Point Blank♭Bruce Springsteen 1980
One of the most important aspects of machine instructions is this:
Memory contents can be interpreted (by the hardware) as the address of another cell.
Consider Some Memory Cells in RAM
Label Address Contents
--------------------------------------------
xmpl 0A39 0011 0011 0011 0001
0A3A 0011 1001 0011 0100
0A3B 0000 0000 0000 0000
0A3C 0000 0000 0000 0000
Score 0A3D 0000 1100 0111 1010
Id 0A3E 0000 0000 0000 0011
Age 0A3F 0000 0000 0000 0011
Q 0A40 0000 1010 0011 1001
The labels are identifiers in the human's program that is currently loaded in this part of RAM.
If we convert 4-bit groups at address
0A40
(labeled Q) into hexadecimal we get
0A39
, which happens to be the address of the first cell shown. That means that
Q
is
pointing to another variable in the human's program named
xmpl
.
A
pointer is a scalar data type containing a memory address. Hardware instructions can retrieve the pointer's content, interpret them as an address, then retrieve or manipulate the contents of the address pointed to.
Technical Note: In real computer languages a pointer is usually more than just an address, but a key aspect of a pointer is always a physical address of a location in memory.
In Ox, a
pointer is stored in something it calls an
oxarray
, which was discussed briefly as a way of storing different kinds of items in a list. The difference between a pointer and other data types (like integers, floating-point reals, etc.) is that the language will given you a way to access and manipulate the contents pointed to.
In Ox, if
Q
is an array (or what we would call a pointer in these notes) then
Q[0]
is the syntax for the content that Q points to:
Q[0] = "contents of xmpl" = 0011 0011 0011 0001
= "31".
Q[1]
means go to the address in Q
and then drop down one cell: Q[1] = "94"
.
To use pointers the program must be able to get the address of a location. In Ox that involves using the
&
operator:
&xmpl = 0A39.
That is,
&x
can be called
the location of x
.