Computer Programming Languages -- Numbers
By Herbert J. Bernstein
© Copyright 2000 Herbert J. Bernstein
In mathematics the idea of number arises from providing symbols to represent the cardinalities of sets of object. We commonly use positional decimal notation. Numbers can be represented using other choices of radix, such as 2, 8 and 16, which are commonly used for work with computers.
The internal representation of numbers on most computers is now informally standardized on two's complement integers based on 8-bit bytes and real numbers based on the "IEEE Standard for Floating Point Arithmetic", ANAI/IEEE std 754-1985, IEEE, NY 1985. However, many other internal representations have been used in the based, and some of them are still supported on various computers.
Computer memory is organized into a sequence of addressable units of memory. If the processor needs to handle some portion of information in such a unit it must fetch all of it. If the processor needs to replace some portion of information in such a unit it must replace all of it. On some computers the addressable unit of memory is a word. In some computers the addressable unit of memory is a byte. On some computers it both may be used. On word-addressable computers, software is used to work with bytes within words. Most modern computers have 8 bits in each byte, but this is an accident of history, not a law of nature. If we wish to refer precisely to an 8-bit unit of memory, we say, octet.
There is little ambiguity in how number is stored in a byte. By definition, the low-order binary digit of a number is stored in bit 0. However there is considerable ambiguity about how to store a number which occupies more than one byte. There are two very natural choices: with the high-order digits of the numbers coming first in memory (big-endian) or with the low-order digits of numbers coming first in memory (little-endian). Suppose a single byte can hold digits from 0 through r - 1. Suppose we have a number, N, which requires 4 bytes: N = b0 + b1*r + b2*r**2 + b3*r**3. If we intend to store the number into the memory byte-locations n, n+1, n+2, and n+3 on a big-endian machine we would have:
If we intend to store the number, N, into the memory byte-locations n, n+1, n+2, and n+3 on a little-endian machine we would have:
When mutiple groups of bytes are combined to form higher precision numbers, the same decisions about which end of the overall number to store first can give rise to very complex combinations of byte-orderings.
On modern computers, the most common grouping of bytes is as sets of 4 octets grouped into 32 bit words. A grouping of 2 octets is called a half-word, and a grouping of 8 octets is called a long word.
|Memory Unit||Size in Bits||Range from 0 to ...|
This gives us a natural way to represent unsigned integers. If we wish to represent signed numbers, we need to reserve a bit for a sign, and to choose among:
|One's complement||negative numbers as complement|
|Two's complement||negative numbers as complement+1|
|Signed magnitude||one bit sign followed by magnitude as an unsigned integer|
|Biased||Values shifted by subtracting/adding half the range|
A very natural representation of negative numbers is obtained by "complementing" each bit of a number to represent the negative of that number. Each 0 is changed to a 1 and each 1 in change to a 0. This is conceptually similar to the used of a "9's-complement" representation for negative numbers on old electro-mechanical calculators. An interesting implication of this representation is that we obtain a representation of -0, which is mathmatically anomalous.
Negation of one'c complement numbers is a fast bit-by-bit operation. Addition of one's complement numbers is uniform for positive and negative numbers, but requires an "end-around-carry" from the sign bit to the low-order bit.
The most commonly used representation of signed integers on modern computers is as two' complement numbers. The high order bit is reserved for a sign bit. If a negative number is desired, each bit is "complemented", i.e. each 0 is changed to a 1 and each 1 in change to a 0. Finally, 1 is added to the result. This is the same result we would have obtained if we had subtracted the magnitude of the number from a number one larger than the range.
Surprisingly, the negation operation is symmetric. It is not quite as simple as the negation of the one's complement representation, but addition is simpler, using an end-off carry.
A curious effect of using complements is that small negative numbers have many bits set. An alternative approach is to restrict the effect of negation to the sign bit. As with one's complement respresentation, negation is fast, but gives rise to the possibility of a -0. Unlike the complement-based representations, addition is not uniform for positive and negative numbers.
The biased representation is the same to the two-complement representation, but with the sign bit inverted. Numbers are represented by half the range (i.e. a set sign bit) added to the number, whether positive or negative.
|number||binary representation||one's complement||two's complement||signed magnitude||biased|
Real numbers (numbers which are not whole numbers) can be represented many different ways on computers. If a uniform, fixed decimal point (or binary point) position is desired, the integer formats will suffice. More interesting is the handling of floating point numbers in which differnt numbers may have different relative decimal point positions. For scientific applications, so-called scientific notation is used in which a number is given as a fixed point number multiplied by a power of ten. Internally this is converted to a similar representation using a fixed point number multiplied by a power of two (or a power of some power of two).
There are many variants on such representations. We will look at two groups of floating point representations: The DEC (now Compaq) VMS formats and the IEEE floating point formats.
The VMS floating point formats are the ancestors of the highly popular IEEE floating point formats. They are intended for computers with 8-bit bytes and two-complement integer arithmetic, and are organized so that floating point numbers may be processed efficiently even on computers without specialized floating point hardware. There are 4-, 8-, and 16-byte formats. In each case the format begins with a sign bit followed by an exponent and then by a fraction. Qualitatively, the number is intended to be sign*fraction*(2^exponent), The sign is represented by 0 for a positive number, 1 for a negative number. The exponent is represented in 8, 11 or 15 bits as a biased integer. The biases are 128, 1024 or 16384, so that an exponent field with just the sign bit set represents and exponent of zero. If the sign and exponent fields contains a bit pattern of all zeros, the fraction is ignored and treated as zero. If the exponent field has any bits set, the fraction field represents a normalized binary fraction with a one-bit to the right of the binary point, and from which the leading one-bit has been removed.
See Appendix B of "DEC Fortran User Manual for OpenVMS VAX Systems", DEC Order Number: AA-PUYPA-TE, Digital Equipment Corporation, Maynard, Massachusetts, January 1993.
|number||sign field||exponent field||fraction field|
|(-1)^s*f*2^E for .5 <= f < 1||s||E - bias||f-.5 wo binary point|
The structure of an IEEE floating point number is similar tothe VMS format, consisting of a sign bit followed by an exponent and then by a fraction. In these formats the exponent is represented in 8 or 11 bits as a biased integer. However, instead of specifying the bias as a power of 2, it is specified as one less than a power of 2 (i.e. 127 or 1023), so that an exponent field with just the sign bit set represents an exponent of 1. If the exponent field has any bits set, the fraction field represents the entire fraction or a normalized binary number with a one bit to the left of the binary point, and from which that one-bit to the left of the binary point has been removed. If the exponent field contains a bit pattern of all zeros, the fraction field represents the entire fraction of a denormalized binary number, with no portion to the left of the binary point.
See "IEEE Standard for Binary Floating-Point Arithmetic", ANSI/IEEE Std 754-1985, the Institute of Electrical and Electronics Engineers, Inc., NY 1985.
|number||sign field||exponent field||fraction field|
|0||0 or 1||0...000000||0...0000000|
|(-1)^s*f*2^E for 1 <= f < 2||s||E - bias||f-1 wo binary point|
|(-1)^s*f*2^(-126) for 0 < f < 1||s||0...000000||f wo binary point|
|single NaN||0 or 1||11111111||any non-zero pattern|
|double NaN||0 or 1||11111111111||any non-zero pattern|
|single infinity||0 or 1||11111111||0...0000000|
|double infinity||0 or 1||11111111111||0...0000000|
In business computing it is common practice to represent real numbers as strings of decimal digits, either as one digit per 8-bit byte, or as 2 digits per 8-bit byte (packed decimal).
COBOL makes use of decimal representations. There are many COBOL variants, but one commonly used set of representations is:
DISPLAY: representation of a number as a series of bytes, with each digit represented as the bottom nibble of each byte. The rightmost byte contains a sign (hex F for -) in the high order nibble.
COMPUTATIONAL: represenation as 2's complement binary, may be 2 bytes (halfword), 4 bytes (fullword), 8 bytes (2 fullwords) depending on the number of digits specified.
COMPUTATIONAL-3: representation as packed decimal (one digit per nibble), with the low order nibble used for the sign.
There are also two floating point formats: COMPUTATIONAL-1 and COMPUTATIONAL-2 for single and double precision.