CS314 Numerical
Methods
Week 1 Notes
By Melroy Saldanha
Webpage http://www.cs.purdue.edu/homes/cs314
Goals
·
You
will learn how to use the computer to approximate the solutions of non-linear and
linear equations
·
Numerical
integration and differentiation
·
Differential
equations
·
You
will learn the potential and limitations of the computer to solve mathematical
problems
Textbook
Numerical Analysis Using Matlab: Third Edition
Mailing List
Log in to a CS (lore) Machine and type mailer add me to cs314
Grade Distribution
·
Projects
and Homeworks 50%
·
Midterm
25%
·
Final
25%
Syllabus
·
Floating
point, fixed point representation and computational error
·
Solutions
of non-linear equations of form f(x) = 0
·
Solutions
of linear systems of form Ax = B
·
Interpolation
and curve fitting
·
Numerical
differentiation and integration
·
Numerical
optimization
·
Solution
of differential equations
Binary Numbers
Computers
use binary numbers. This is because it was easier to implement circuits with
states on/off.
Example:
537.510 (Base 10)
Decimal
(Base 10): 5x102 + 3x101 + 7x100 + 5x10-1
Binary (Base 2): 1x29 + 0x28 + 0x27
+ 0x26 + 0x25 + 1x24 + 1x23 + 0x22 + 0x21
+ 1x20 + 1x2-1 = 1000011001.12
Base Conversions: Base
10 to Base 2
Separate the
integer part from the fraction
π
Integer
Part: N = b0 + b1x21 + ......... + bkx2k
To find the
digit bi (0 or 1), we divide N by 2
π
N/2
= b0/2 + b1x20 + ......... + bkx2k-1
Here the
value of b0/2 is the remainder and the rest is the integer part
π
New
N = b1x20 + ......... + bkx2k-1
The
remainder determines whether b0 is 0 or 1. We repeat this process of
dividing N by 2 until the integer part is 0
Example: 3410
34/2 = 17,
remainder = 0 = b0
17/2 = 8,
remainder = 1 = b1
8/2 = 4,
remainder = 0 = b2
4/2 = 2,
remainder = 0 = b3
2/2 = 1,
remainder = 0 = b4
1/2 = 0,
remainder = 1 = b5
π
3410
= 1000102
Other
Examples
0.12 = 1x2-1 = ½
= 0.510
0.012 = 0x2-1 +
1x2-2 = Ό = 0.2510
0.112 = 1x2-1 +
1x2-2 = ½ + Ό = Ύ = 0.7510
1/3 10 = 0x2-1 +
1x2-2 + 0x2-3 + 1x2-4 +
. = 0.0101010101
2
Email
Send questions to
cs314-ta@cs.purdue.edu
Representation of
Numbers in Machine
·
Fixed
Point and Floating Point
·
Fixed
Point:
The number of
bits for integer part and fraction is fixed.
Example:
nnnn.ff 4 bits for mantissa, 2 bits for fraction. Largest number 1111.112
= 15.7510. Smallest Number 0000.002 = 010
Usually you
give more bits to integer part and fraction
Example: 2
bytes integer part, 2 bytes fraction
Advantages:
·
Fast
Arithmetic (Use the same integer arithmetic)
Example
1.012 + 10.102 = 11.112 = 3.7510
1.012 = 1.2510, 10.102 = 2.5010
1.012 + 10.102 = 1.2510 + 2.5010
= 3.7510
·
Used
in signal processors and signal processing
Example
Mp3 players, MPEG, etc
Disadvantages
·
Very
restrictive range
·
Bad
precision smallest number is restricted to number of bits used in fraction
·
Generates
a lot of error during operation
·
Floating
Point
The decimal
point moves around to keep desired precision. Using scientific notation that
you know
± Q x 10N, where Q is
the mantissa.
The
precision of the number depends on number of bits used to represent Q.
The numbers
are represented on computer as
x = ± q x 2n, where q and n are represented with a fixed
number of bits.
To have a
unique representation for each number, the numbers are normalized so that
½ ≤ q < 1 (Exception
- 0)
The
uniqueness is useful for some operations such as checking for equality.
Example 4
bit mantissa and 3 bit exponent
Add (1/3 + 3/5) + 1/15
1/3 = (0.010101010101
.)2 = 0.1010101
x 2-1
= 0.1011 x 2-1 (Rounding)
3/5
= 0.6 = (0.1001100
.)2 = 0.1010 x 20 (Rounding)
Adding
1/3 + 3/5
=
0.1011 x 2-1 + 0.1010 x 20
=
0.01011 x 20 + 0.10100 x 20 = 0.11111 x
20
=
1.0000 x 20 (Rounding) = 0.1000 x 21 (Solution Part 1)
Now
we add 1/15 to this solution
1/15
= (0.00010001
.)2 = 0.10001.. x 2-3
=
0.1001 x 2-3 (Rounding)
Adding
this to solution from part 1
=
0.1000 x 21 + 0.1001 x 2-3
=
0.100000000 x 21 + 0.000010001 x 21
=
0.100010001 x 21 = 0.1001 x 21 (Rounding)
=
1.001 = 1.125210
Actual
Answer = 1
Absolute
Error = | 1-1.125| = 0.125
Relative
Error = | 1-1.125| / 1 = 0.125 = 12.5 %
Modern Architecture
Representation of Real Numbers
·
Single
Precision (Float)
Uses 4
bytes (32 bits), 24 bits mantissa (1 bit sign), 8 bits exponent (1 bit sign)
Range is 2-128
to 2127
Accuracy is
the smallest difference between two numbers with exponent 0.
c = ±
.10000
00..01 (23 bits) x 20
Accuracy =
2-23 = 1.2 x 10-7 = 6 decimal digits of precision
·
Double
Precision
Uses 8
bytes (64 bits), 53 bits mantissa (1 bit sign) & 11 bits exponent (1 bit
sign)
Range is 21024
Accuracy is
2-52 = 2.2 x 10-16 = 15 decimal digits of precision
Errors
Absolute
Error = | p p* | where p* is the approximation of p
Relative
Error = | p p* | / | p | for P ≠ 0
Chop-off vs. Round-off
Pi =
3.141592654
Use three
digits for fraction
π
Chop-off
= 3.141
π
Rounding
= 3.142
Absolute
Error
Chop-off = 0.59 x 10-3
Round-off = 0.41 x 10-3
Relative
Error
Chop-off = 1.9 x 10-4 =
0.019 %
Round-off = 1.3 x 10-4 =
0.013 %
π
Round-off
is better due to a smaller error
Propogation Errors
Assume p and q are approximations by p* and q* such
that
p* = p + ep
, q* = q + eq
Sum of p* and
q* = p+q + (ep + eq)
Error
of sum is the sum of all individual errors
Product of p*
and q* = (p + ep)(q + eq) = pq + peq + qep
+ epeq
Error
of product = peq + qep + epeq
Errors
are amplified by magnitude of p and q
Algorithms
Stable: If small initial errors stay small
Unstable: If
small initial errors get large after several iterations
Solution of Non-Linear
Equations of form f(x) = 0
We will use
iteration to solve these equations
o
Starting
with a value P0 and a function Pk+1 = g(Pk)
o
We
use a stopping criteria base step to stop computation
Example -
| Pk - Pk-1 | < E
Definitions
·
Fixed
Point
A fixed point of g(x) is a real number p such that p=g(p)
Geometrically,
the fixed point of a function y=g(x) are the points of intersection between
y=g(x) and y=x
·
Fixed
Point Iteration
It is an
equation of form Pn+1 = g(pn) for n = 0,1,2
Example: x2
-2x +1 = 0
Get a fixed point iteration equation
x2 +1 = 2x => x = (x2 +1)/2
xk+1 = (xk2 +1)/2
Let
x0 = 0
x1 = (02 +1)/2 = ½
x2 = ( (1/2)2 +1 )/2 = 5/8
x3 = ( (5/8)2 +1 )/2 = 89/128
.
x∞ = 1
·
Fixed
Point Theorem
o
If
| g(x) | ≤ k < 1, for x in [a,b] and g(x) in [a,b], then the
iteration Pn = g(pn-1) will converge to a unique fixed
point. In this case P is said to be an attractive fixed point
o
If
| g(x) | > 1, then the iteration Pn = g(pn-1) will
not converge
Convergence
·
First
Case : 0 <= g(p) < 1, Monotone Convergence
·
Second
Case : -1 < g(p) < 0, Oscillating
Convergence
What kind of
convergence in earlier example?
g(x) = (x2 + 1) /2 => g(x) = 2x/2 = x
If | g(x)
| <1 for all x in [a,b] and g(x) in [a,b], then g(x) converges
For the earlier
example, it is a monotone convergence hence
0 <=
g(x) < 1 => 0 <= x < 1 => Starting point should be between 0
and 1 for convergence
Example: x0
= 0
x1 = (02 +1)/2 = ½
x2 = ( (1/2)2 +1 )/2 = 5/8
.
x∞ = 1
However if
| g(x) | > 1 => Divergence
Example:
Let x0 = 2,
=> | g(x0) | = 2 > 1 => divergence
x1 = (22 +1)/2 = 5/2
x2 = ( (5/2)2 +1 )/2 = 29/8
.
x∞ = ∞ => diverges from answer
Actual
answer = 1
Bisection Method
Similar to Binary
Search. We start with two values a,b (an interval) such that a<b and f(a)
and f(b) are of opposite signs
That is f(a) <0 and f(b) > 0
or f(a) > 0 and f(b) < 0
This means
that there is an x, a < x < b, such that f(x) = 0 (There is a zero
between a and b).
Method:
1.
Get
middle point of a,b => c = (a+b) /2
2.
If
f(a) and f(c) have opposite signs then 0 lies in [a,c]. Then substitute c for
b => b ← c
3.
If
f(b) and f(c) have opposite signs then 0 lies in [c,b]. Then substitute c for
a => a ← c
4.
If
f(c) = 0 then c is a zero => stop
Example:
sin(x) = ½ (x is in degrees)
f(x) = sin(x)
½ = 0
Lets start
with a = 0, b = 50 => f(0) = -0.5, f(50) = 0.266
f(a) < 0
and f(b) > 0 so the conditions are fine to begin.
a |
b |
c |
f(a) |
f(b) |
f(c) |
0 |
50 |
25 |
-.5 |
.266 |
-.077 |
25 |
50 |
37.5 |
-.077 |
0.266 |
0.1087 |
25 |
37.5 |
31.25 |
-.077 |
0.1087 |
0.018 |
25 |
31.25 |
28.125 |
-.077 |
0.018 |
-0.02 |
This process
continues till a solution is found. It seems c is approaching 30 in this
example (solution = 30)
Analysis of
Algorithm
At
each iteration the range is reduced by half.
|a1
b1| = |a0 b0|/2
|a2
b2| = |a0 b0|/22
=> |an bn| = |a0 b0|/2n
If we want
an approximation error < E, how many steps do we need?
|an bn| = |a0 b0|/2n
< E => |a0 b0|/E
< 2n
log(|a0 b0|/E) < log 2n =>
log(|a0 b0|/E) < n*log (2)
log(|a0 b0|/E) / log (2) < n
Example: E
= 0.01
log(|0 50|/ 0.01) / log (2) < n
(approx) 12
< n
Divergence
·
Monotone
divergence 1 < g(p)
·
Divergent
Oscillation g(p) < 1
False Position
(Regula-Falsi)
Similar to
bisection method. It also needs an interval [a,b] where a zero is found. It
converges faster than bisection.
Method:
·
A
line is subtended between (a, f(a)), (b, f(b))
·
c
is the intersection of this line and x axis
·
We
use same rules as bisection
·
If
f(a) and f(c) have different signs, then assign b←c
·
If
f(b) and f(c) have different signs, then assign a←c
·
To
compute c, we do the following
m = (f(a)
f(b)) / (a-b) (Equation 1)
m = (f(b)
0) / (b-c) (Equation 2)
Equating
the above two equations, we get
f(b) / (b-c) = (f(a) f(b)) / (a-b)
b-c = f(b) * (a-b) / (f(a) f(b))
c = b - f(b) * (a-b) / (f(a) f(b) )
Example:
sin(x) = ½ => f(x) = sin(x)
½ = 0
Let a = 0 , b = 50
A |
b |
f(a) |
f(b) |
C |
f(c) |
0 |
50 |
-.5 |
0.266 |
32.637 |
0.0393 |
0 |
32.637 |
-.5 |
0.0393 |
30.25 |
0.0039 |
0 |
30.25 |
-.5 |
0.0039 |
30.0248 |
0.000375 |
·
c
seems to approach 30 as f(x) approaches
0. We stop the iterations when we reach the desired precision
o
Horizontal
Convergence
Instead of
expecting f(x) = 0, we expect f(x) to be between some small error E such that
f(x) < E
Example: if
E = 0.001, then we would stop at iteration i=2 (|0.000375| < 0.001)
Geometrically
this means that zero is in some horizontal band
o
Vertical
Convergence
Stop when x
is at a distance S from p (solution). Stop when | x p | < S
However
this is not possible to know since it implies we already know p. This is
approximated by determining when | Pn Pn-1 | < S
o
Both
Horizontal and Vertical Convergence
Stop when |
Pn Pn-1 | < E & f(x) < E
Troublesome Functions
Functions where obtaining the solution of f(x) = 0 is
difficult.
If graph
y=f(x) is steep near root (p,0) then the root finding is well-conditioned, ie
a solution with good precision is easy to obtain.
If graph
y=f(x) is shallow near (p,0) then the root finding is ill-conditioned ie the root
finding may only have a few significant digits.