CS314 Numerical Methods

CS314 Numerical Methods

Week 1 Notes

By Melroy Saldanha

Webpage http://www.cs.purdue.edu/homes/cs314

Goals

· You will learn how to use the computer to approximate the solutions of non-linear and linear equations

· Numerical integration and differentiation

· Differential equations

· You will learn the potential and limitations of the computer to solve mathematical problems

Textbook

Numerical Analysis Using Matlab: Third Edition

Mailing List

Grade Distribution

· Projects and Homeworks – 50%

· Midterm – 25%

· Final – 25%

Syllabus

· Floating point, fixed point representation and computational error

· Solutions of non-linear equations of form f(x) = 0

· Solutions of linear systems of form Ax = B

· Interpolation and curve fitting

· Numerical differentiation and integration

· Numerical optimization

· Solution of differential equations

Binary Numbers

Computers use binary numbers. This is because it was easier to implement circuits with states on/off.

Example: 537.5₁₀ (Base 10)

Decimal (Base 10): 5x10² + 3x10¹ + 7x10⁰ + 5x10^-1

Binary (Base 2): 1x2⁹ + 0x2⁸ + 0x2⁷ + 0x2⁶ + 0x2⁵ + 1x2⁴ + 1x2³ + 0x2² + 0x2¹ + 1x2⁰ + 1x2^-1 = 1000011001.1₂

Base Conversions: Base 10 to Base 2

Separate the integer part from the fraction

ð Integer Part: N = b₀ + b₁x2¹ + ......... + b_kx2^k

To find the digit b_i(0 or 1), we divide N by 2

ð N/2 = b₀/2 + b₁x2⁰ + ......... + b_kx2^k-1

Here the value of b₀/2 is the remainder and the rest is the integer part

ð New N = b₁x2⁰ + ......... + b_kx2^k-1

The remainder determines whether b₀ is 0 or 1. We repeat this process of dividing N by 2 until the integer part is 0

Example: 34₁₀

34/2 = 17, remainder = 0 = b₀

17/2 = 8, remainder = 1 = b₁

8/2 = 4, remainder = 0 = b₂

4/2 = 2, remainder = 0 = b₃

2/2 = 1, remainder = 0 = b₄

1/2 = 0, remainder = 1 = b₅

ð 34₁₀= 100010₂

Other Examples

0.1₂= 1x2^-1= ½ = 0.5₁₀

0.01₂= 0x2^-1+ 1x2^-2= ¼ = 0.25₁₀

0.11₂= 1x2^-1+ 1x2^-2= ½ + ¼ = ¾ = 0.75₁₀

1/3 ₁₀= 0x2^-1+ 1x2^-2+ 0x2^-3+ 1x2^-4+ …. = 0.0101010101…₂

Email

Send questions to ‘cs314-ta@cs.purdue.edu’

Representation of Numbers in Machine

· Fixed Point and Floating Point

· Fixed Point:

The number of bits for integer part and fraction is fixed.

Example: nnnn.ff – 4 bits for mantissa, 2 bits for fraction. Largest number – 1111.11₂= 15.75₁₀. Smallest Number – 0000.00₂= 0₁₀

Usually you give more bits to integer part and fraction

Example: 2 bytes integer part, 2 bytes fraction

Advantages:

· Fast Arithmetic (Use the same integer arithmetic)

Example – 1.01₂ + 10.10₂ = 11.11₂ = 3.75₁₀

1.01₂= 1.25₁₀, 10.10₂= 2.50₁₀

1.01₂ + 10.10₂= 1.25₁₀ + 2.50₁₀ = 3.75₁₀

· Used in signal processors and signal processing

Example – Mp3 players, MPEG, etc

Disadvantages

· Very restrictive range

· Bad precision – smallest number is restricted to number of bits used in fraction

· Generates a lot of error during operation

· Floating Point

The decimal point moves around to keep desired precision. Using scientific notation that you know

± Q x 10^N, where Q is the mantissa.

The precision of the number depends on number of bits used to represent Q.

The numbers are represented on computer as

x = ± q x 2ⁿ, where q and n are represented with a fixed number of bits.

To have a unique representation for each number, the numbers are normalized so that

½ ≤ q < 1 (Exception - 0)

The uniqueness is useful for some operations such as checking for equality.

Example – 4 bit mantissa and 3 bit exponent

Add (1/3 + 3/5) + 1/15

1/3 = (0.010101010101….)₂= 0.1010101… x 2^-1

= 0.1011 x 2^-1(Rounding)

3/5 = 0.6 = (0.1001100….)₂= 0.1010 x 2⁰(Rounding)

Adding 1/3 + 3/5

= 0.1011x 2^-1 + 0.1010 x 2⁰

= 0.01011x 2⁰ + 0.10100 x 2⁰= 0.11111x 2⁰

= 1.0000 x 2⁰ (Rounding) = 0.1000 x 2¹(Solution Part 1)

Now we add 1/15 to this solution

1/15 = (0.00010001….)₂= 0.10001.. x 2^-3

= 0.1001 x 2^-3(Rounding)

Adding this to solution from part 1

= 0.1000 x 2¹+ 0.1001 x 2^-3

= 0.100000000 x 2¹+ 0.000010001 x 2¹

= 0.100010001 x 2¹= 0.1001 x 2¹(Rounding)

= 1.001 = 1.1252₁₀

Actual Answer = 1

Absolute Error = | 1-1.125| = 0.125

Relative Error = | 1-1.125| / 1 = 0.125 = 12.5 %

Modern Architecture – Representation of Real Numbers

· Single Precision (Float)

Uses 4 bytes (32 bits), 24 bits mantissa (1 bit sign), 8 bits exponent (1 bit sign)

Range is 2^-128to 2¹²⁷

Accuracy is the smallest difference between two numbers with exponent 0.

c = ± .10000…00..01 (23 bits) x 2⁰

Accuracy = 2^-23= 1.2 x 10^-7 = 6 decimal digits of precision

· Double Precision

Uses 8 bytes (64 bits), 53 bits mantissa (1 bit sign) & 11 bits exponent (1 bit sign)

Range is 2¹⁰²⁴

Accuracy is 2^-52= 2.2 x 10^-16= 15 decimal digits of precision

Errors

Absolute Error = | p – p* | where p* is the approximation of p

Relative Error = | p – p* | / | p | for P ≠ 0

Chop-off vs. Round-off

Pi = 3.141592654…

Use three digits for fraction

ð Chop-off = 3.141

ð Rounding = 3.142

Absolute Error

Chop-off = 0.59 x 10^-3

Round-off = 0.41 x 10^-3

Relative Error

Chop-off = 1.9 x 10^-4= 0.019 %

Round-off = 1.3 x 10^-4= 0.013 %

ð Round-off is better due to a smaller error

Propogation Errors

Assume p and q are approximations by p* and q* such that

p* = p + e_p , q* = q + e_q

Sum of p* and q* = p+q + (e_p + e_q)

Error of sum is the sum of all individual errors

Product of p* and q* = (p + e_p)(q + e_q) = pq + pe_q+ qe_p+ e_pe_q

Error of product = pe_q+ qe_p+ e_pe_q

Errors are amplified by magnitude of p and q

Algorithms

Stable: If small initial errors stay small

Unstable: If small initial errors get large after several iterations

Solution of Non-Linear Equations of form f(x) = 0

We will use ‘iteration to solve these equations

o Starting with a value P₀and a function P_k+1= g(P_k)

o We use a stopping criteria ‘base step’ to stop computation

Example - | P_k- P_k-1| < E

Definitions

· Fixed Point

A fixed point of g(x) is a real number p such that p=g(p)

Geometrically, the fixed point of a function y=g(x) are the points of intersection between y=g(x) and y=x

· Fixed Point Iteration

It is an equation of form P_n+1= g(p_n) for n = 0,1,2…

Example: x² -2x +1 = 0

Get a fixed point iteration equation

x² +1 = 2x => x = (x² +1)/2

x_k+1 = (x_k² +1)/2

Let x₀= 0

x₁= (0² +1)/2 = ½

x₂= ( (1/2)² +1 )/2 = 5/8

x₃= ( (5/8)² +1 )/2 = 89/128

….

x_∞= 1

· Fixed Point Theorem

o If | g’(x) | ≤ k < 1, for x in [a,b] and g(x) in [a,b], then the iteration P_n= g(p_n-1) will converge to a unique fixed point. In this case P is said to be an attractive fixed point

o If | g’(x) | > 1, then the iteration P_n= g(p_n-1) will not converge

Convergence

· First Case : 0 <= g’(p) < 1, Monotone Convergence

· Second Case : -1 < g’(p) < 0, Oscillating Convergence

What kind of convergence in earlier example?

g(x) = (x² + 1) /2 => g’(x) = 2x/2 = x

If | g’(x) | <1 for all x in [a,b] and g(x) in [a,b], then g(x) converges

For the earlier example, it is a monotone convergence hence

0 <= g’(x) < 1 => 0 <= x < 1 => Starting point should be between 0 and 1 for convergence

Example: x₀ = 0

x₁= (0² +1)/2 = ½

x₂= ( (1/2)² +1 )/2 = 5/8

….

x_∞= 1

However if | g’(x) | > 1 => Divergence

Example: Let x₀ = 2,

=> | g’(x₀) | = 2 > 1 => divergence

x₁= (2² +1)/2 = 5/2

x₂= ( (5/2)² +1 )/2 = 29/8

….

x_∞= ∞ => diverges from answer

Actual answer = 1

Bisection Method

Similar to Binary Search. We start with two values a,b (an interval) such that a<b and f(a) and f(b) are of opposite signs

That is f(a) <0 and f(b) > 0 or f(a) > 0 and f(b) < 0

This means that there is an x, a < x < b, such that f(x) = 0 (There is a zero between a and b).

Method:

1. Get middle point of a,b => c = (a+b) /2

2. If f(a) and f(c) have opposite signs then ‘0’ lies in [a,c]. Then substitute c for b => b ← c

3. If f(b) and f(c) have opposite signs then ‘0’ lies in [c,b]. Then substitute c for a => a ← c

4. If f(c) = 0 then c is a zero => stop

Example:

sin(x) = ½ (x is in degrees)

f(x) = sin(x) – ½ = 0

Lets start with a = 0, b = 50 => f(0) = -0.5, f(50) = 0.266

f(a) < 0 and f(b) > 0 so the conditions are fine to begin.

a	b	c	f(a)	f(b)	f(c)
0	50	25	-.5	.266	-.077
25	50	37.5	-.077	0.266	0.1087
25	37.5	31.25	-.077	0.1087	0.018
25	31.25	28.125	-.077	0.018	-0.02

This process continues till a solution is found. It seems c is approaching 30 in this example (solution = 30)

Analysis of Algorithm

At each iteration the range is reduced by half.

|a₁ – b₁| = |a₀ – b₀|/2

|a₂ – b₂| = |a₀ – b₀|/2²

=> |a_n – b_n| = |a₀ – b₀|/2ⁿ

If we want an approximation error < E, how many steps do we need?

|a_n – b_n| = |a₀ – b₀|/2ⁿ< E => |a₀ – b₀|/E < 2ⁿ

log(|a₀ – b₀|/E) < log 2ⁿ=> log(|a₀ – b₀|/E) < n*log (2)

log(|a₀ – b₀|/E) / log (2) < n

Example: E = 0.01

log(|0 – 50|/ 0.01) / log (2) < n

(approx) 12 < n

Divergence

· Monotone divergence – 1 < g’(p)

· Divergent Oscillation – g’(p) < 1

False Position (Regula-Falsi)

Similar to bisection method. It also needs an interval [a,b] where a zero is found. It converges faster than bisection.

Method:

· A line is subtended between (a, f(a)), (b, f(b))

· ‘c’ is the intersection of this line and x axis

· We use same rules as bisection

· If f(a) and f(c) have different signs, then assign b←c

· If f(b) and f(c) have different signs, then assign a←c

· To compute c, we do the following

m = (f(a) – f(b)) / (a-b) (Equation 1)

m = (f(b) – 0) / (b-c) (Equation 2)

Equating the above two equations, we get

f(b) / (b-c) = (f(a) – f(b)) / (a-b)

b-c = f(b) * (a-b) / (f(a) – f(b))

c = b - f(b) * (a-b) / (f(a) – f(b) )

Example:

sin(x) = ½ => f(x) = sin(x) – ½ = 0

Let a = 0 , b = 50

A	b	f(a)	f(b)	C	f(c)
0	50	-.5	0.266	32.637	0.0393
0	32.637	-.5	0.0393	30.25	0.0039
0	30.25	-.5	0.0039	30.0248	0.000375

· ‘c’ seems to approach 30 as f(x) approaches 0. We stop the iterations when we reach the desired precision

o Horizontal Convergence

Instead of expecting f(x) = 0, we expect f(x) to be between some small error E such that f(x) < E

Example: if E = 0.001, then we would stop at iteration i=2 (|0.000375| < 0.001)

Geometrically this means that zero is in some horizontal band

o Vertical Convergence

Stop when x is at a distance S from p (solution). Stop when | x – p | < S

However this is not possible to know since it implies we already know p. This is approximated by determining when | P_n – P_n-1 | < S

o Both Horizontal and Vertical Convergence

Stop when | P_n – P_n-1 | < E & f(x) < E

Troublesome Functions

Functions where obtaining the solution of f(x) = 0 is difficult.

If graph y=f(x) is steep near root (p,0) then the root finding is ‘well-conditioned’, ie a solution with good precision is easy to obtain.

If graph y=f(x) is shallow near (p,0) then the root finding is ‘ill-conditioned’ ie the root finding may only have a few significant digits.