1.2 Binary Numbers

[1.2 Binary Numbers] [1.3 Errors] [2.1 Interation x=g(x)] [2.2 Nonlinear Solutions of f(x)=0]

1.2 Binary Numbers

Q: Why do we use binary numbers?
A: We use binary numbers because it was easy to implement with circuits. Binary only requires two states: "ON" and "OFF"

Example: 537.5₁₀ = 1 x 2⁹ + 0 x 2⁸ + 0 x 2⁷ + 0 x 2⁶ + 0 x 2⁵ + 1 x 2⁴ + 1 x 2³ + 0 x 2² + 0 x 2¹ + 1 x 2⁰ + 1 x 2^-1

How to get from base 10 to base 2
Separate the integer part from the fraction part: N.F
The integer part can be represented as

We wish to find the digit b_i (0 or 1)
We divide by 2:

The remainder determines if b₀ is 0
- Repeat until the integer part is 0

Example: How to convert 34 in base 10 to binary (base 2)
34/2 = 17 R 0 ==> 0
17/2 = 8 R 1 ==> 1
8/2 = 4 R 0 ==> 0
4/2 = 2 R 0 ==> 0
2/2 = 1 R 0 ==> 0
1/2 = 0 R 1 ==> 1

So 34₁₀ = 100010₂
Note : The binary representation is the reverse of the 1's and 0's obtained by dividing

Example:
0.1₂ = 1*2^-1 = 1/2
0.01₂ = 1*2^-2 = 1/4
0.1₁₂ = 1*2^-1 + 1*2^-2 = 1/2 + 1/4 = 3/4

Q: What about 1/3?
A: 1/3 = .3333 = 1/2 * 0 + 1/4 * 1 + 1/8 * 0 + 1/16 * 1 + ... = 0.010101₁₀

Representation of numbers in the machine

Two common representations:
1. Fixed Point
2. Floating Point

Fixed Point
The number of bits for the integer part and the fraction part are fixed

Example: NNNN.FF₂ has 4 bits for the integer part and 2 bits for the fraction
This is just an example: usually you would give more bits for for both the integer part and the fraction.
For example: 2 bytes for the integer and 2 bytes for the fraction

Advantages of fixed point:

	Fast arithmetic - Uses integer arithmetic Example: 1.0 1 = 1.25 + 1 0.1 0 = 2.5 -------------------------- 1 1.1 1 = 3.75
	Used in in signal processors and signal processing programs that require real time operations Example: MP3 Player, MP3 Encoder/Decoder, MPEG, etc

Disadvantages of fixed point:

	Very restrictive range
	Bad precision
	Smallest number is restricted to the number of bits used in the fraction
	Generates a lot intermediate of errors during operations

Floating Point
The decimal point moves around to keep the desired precision.
Uses scientific notation:

Q is called the mantissa
N is called the exponent

The precision of the number depends on the number of bits used to represent Q.
The numbers are represented in the computer as

where Q and N are represented by a fixed number of bits.

Uniqueness

To have a unique representation for each number, numbers are normalized or adjusted so that

except for 0. The uniqueness is useful for some operations such as checking for equality.

Example: Adding (1/3 + 3/5) + 1/15

Assume 4-bit mantissa and 2-bit exponent

1/3 = (.010101...)₂ * 2⁰ = (.10101...)₂ *2^-1 =>(.1011)₂ *2^-1
3/5 = (.1000110011...)₂ * 2⁰ => (.1010)₂ * 2⁰

Adding 1/3+3/5

Normalize the largest exponent before addition
Note: Result has the larger of the two exponents

Note: When processors add numbers, they use registers twice as big for the computation.

Adding 1/15 to the previous result

1/15 = (.100010001...)₂ * 2⁰ => (.1001)₂ * 2^-3

Since 1/3 + 3/5 + 1/15 = 1 and our result was 1.125, then there was an error due to the very small mantissa.

Modern architectures give 2 options to represent real numbers:

Single precision (float)

uses 4 bytes (32 bits)

	24 bit mantissa (1 bit for sign)
	8 bit exponent (1 bit for sign)

Range is 2^-128 to 2¹²⁷

Accuracy is the smallest difference between 2 numbers with exponent
Example: 10000 is 2^-23 and you have about 6 decimal digits of precision.

Double precision (double)

uses 8 bytes (64 bits)

	53 bit mantissa (1 bit for sign)
	11 bit exponent (1 bit for sign)
	Range is 2^-1024 to 2¹⁰²³
	Accuracy is 2^-52 => 2.2*10^-16 (about 15 digits of accuracy)

Webpage by Emil Stefanov