CS314 Numerical Methods

Week 1 Notes

By Melroy Saldanha

 

 

Webpage    http://www.cs.purdue.edu/homes/cs314

 

Goals

·        You will learn how to use the computer to approximate the solutions of non-linear and linear equations

·        Numerical integration and differentiation

·        Differential equations

·        You will learn the potential and limitations of the computer to solve mathematical problems

 

Textbook 

Numerical Analysis Using Matlab: Third Edition

 

Mailing List

Log in to a CS (lore) Machine and type ‘mailer add me to cs314’

 

Grade Distribution

·        Projects and Homeworks – 50%

·        Midterm – 25%

·        Final – 25%

 

Syllabus

·        Floating point, fixed point representation and computational error

·        Solutions of non-linear equations of form f(x) = 0

·        Solutions of linear systems of form Ax = B

·        Interpolation and curve fitting

·        Numerical differentiation and integration

·        Numerical optimization

·        Solution of differential equations

 

Binary Numbers

Computers use binary numbers. This is because it was easier to implement circuits with states on/off.

          Example: 537.510 (Base 10)

Decimal (Base 10): 5x102 + 3x101 + 7x100 + 5x10-1

Binary (Base 2): 1x29 + 0x28 + 0x27 + 0x26 + 0x25 + 1x24 +  1x23 + 0x22 + 0x21 + 1x20 + 1x2-1 = 1000011001.12

 

Base Conversions: Base 10 to Base 2

Separate the integer part from the fraction

π        Integer Part: N = b0 + b1x21 + ......... + bkx2k

To find the digit bi (0 or 1), we divide N by 2

π        N/2 = b0/2 + b1x20 + ......... + bkx2k-1

Here the value of b0/2 is the remainder and the rest is the integer part

π        New N = b1x20 + ......... + bkx2k-1

The remainder determines whether b0 is 0 or 1. We repeat this process of dividing N by 2 until the integer part is 0

Example: 3410

34/2 = 17, remainder = 0 = b0

17/2 = 8, remainder = 1 = b1

8/2 = 4, remainder = 0 = b2

4/2 = 2, remainder = 0 = b3

2/2 = 1, remainder = 0 = b4

1/2 = 0, remainder = 1 = b5

π        3410 = 1000102

Other Examples

          0.12 = 1x2-1 = ½ = 0.510

          0.012 = 0x2-1 + 1x2-2 = Ό = 0.2510

          0.112 = 1x2-1 + 1x2-2 = ½ + Ό = Ύ = 0.7510

          1/3 10 = 0x2-1 + 1x2-2 + 0x2-3 + 1x2-4 + …. = 0.0101010101…2

 

Email

          Send questions to ‘cs314-ta@cs.purdue.edu’

 

 

Representation of Numbers in Machine

·        Fixed Point and Floating Point

·        Fixed Point:

The number of bits for integer part and fraction is fixed.

Example: nnnn.ff – 4 bits for mantissa, 2 bits for fraction. Largest number – 1111.112 = 15.7510. Smallest Number – 0000.002 = 010

Usually you give more bits to integer part and fraction

Example: 2 bytes integer part, 2 bytes fraction

Advantages:

·        Fast Arithmetic (Use the same integer arithmetic)

Example – 1.012 + 10.102 = 11.112 = 3.7510

1.012 = 1.2510, 10.102 = 2.5010

1.012 + 10.102 = 1.2510 + 2.5010 = 3.7510

·        Used in signal processors and signal processing

Example – Mp3 players, MPEG, etc

                   Disadvantages

·        Very restrictive range

·        Bad precision – smallest number is restricted to number of bits used in fraction

·        Generates a lot of error during operation

 

·        Floating Point

The decimal point moves around to keep desired precision. Using scientific notation that you know

 ± Q x 10N, where Q is the mantissa.

The precision of the number depends on number of bits used to represent Q.

The numbers are represented on computer as

x = ± q x 2n, where q and n are represented with a fixed number of bits.

To have a unique representation for each number, the numbers are normalized so that

                   ½ ≤ q < 1 (Exception - 0)

The uniqueness is useful for some operations such as checking for equality.

Example – 4 bit mantissa and 3 bit exponent

Add (1/3 + 3/5) + 1/15

1/3 = (0.010101010101….)2 = 0.1010101… x 2-1

                                  = 0.1011 x 2-1 (Rounding)

                             3/5 = 0.6 = (0.1001100….)2 = 0.1010 x 20 (Rounding)

                             Adding 1/3 + 3/5

                             = 0.1011 x 2-1 + 0.1010 x 20

                             = 0.01011 x 20 + 0.10100 x 20 = 0.11111 x 20

                              = 1.0000 x 20 (Rounding) = 0.1000 x 21 (Solution Part 1)

                             Now we add 1/15 to this solution

                             1/15 = (0.00010001….)2 = 0.10001.. x 2-3

                             = 0.1001 x 2-3 (Rounding)

                             Adding this to solution from part 1

                             = 0.1000 x 21 + 0.1001 x 2-3

                             = 0.100000000 x 21 + 0.000010001 x 21

                             = 0.100010001 x 21 = 0.1001 x 21 (Rounding)

                             = 1.001 = 1.125210

                             Actual Answer = 1

 

                             Absolute Error = | 1-1.125| = 0.125

                             Relative Error = | 1-1.125| / 1 = 0.125 = 12.5 %

 

Modern Architecture – Representation of Real Numbers

·        Single Precision (Float)

Uses 4 bytes (32 bits), 24 bits mantissa (1 bit sign), 8 bits exponent (1 bit sign)

Range is 2-128 to 2127

Accuracy is the smallest difference between two numbers with exponent 0.

c = ± .10000…00..01 (23 bits) x 20

Accuracy = 2-23 = 1.2 x 10-7 = 6 decimal digits of precision

 

·        Double Precision

Uses 8 bytes (64 bits), 53 bits mantissa (1 bit sign) & 11 bits exponent (1 bit sign)

Range is 21024

Accuracy is 2-52 = 2.2 x 10-16 = 15 decimal digits of precision

 

Errors

          Absolute Error = | p – p* | where p* is the approximation of p

          Relative Error = | p – p* | / | p | for P ≠ 0

          Chop-off vs. Round-off

          Pi = 3.141592654…

          Use three digits for fraction

π        Chop-off = 3.141

π        Rounding = 3.142

Absolute Error

          Chop-off = 0.59 x 10-3

          Round-off = 0.41 x 10-3

Relative Error

          Chop-off = 1.9 x 10-4 = 0.019 %

          Round-off = 1.3 x 10-4 = 0.013 %

 

π        Round-off is better due to a smaller error

 

Propogation Errors

          Assume p and q are approximations by p* and q* such that

          p* = p + ep , q* = q + eq

          Sum of p* and q* = p+q + (ep + eq)

                   Error of sum is the sum of all individual errors

          Product of p* and q* = (p + ep)(q + eq) = pq + peq + qep + epeq

                   Error of product = peq + qep + epeq

                   Errors are amplified by magnitude of p and q

 

Algorithms

          Stable: If small initial errors stay small

          Unstable: If small initial errors get large after several iterations

 

Solution of Non-Linear Equations of form f(x) = 0

          We will use ‘iteration to solve these equations

o       Starting with a value P0 and a function Pk+1 = g(Pk)

o       We use a stopping criteria ‘base step’ to stop computation

Example - | Pk - Pk-1 | < E

 

Definitions

·        Fixed Point

A fixed point of g(x) is a real number p such that p=g(p)

Geometrically, the fixed point of a function y=g(x) are the points of intersection between y=g(x) and y=x

 

 

·        Fixed Point Iteration

It is an equation of form Pn+1 = g(pn) for n = 0,1,2…

Example: x2 -2x +1 = 0

Get a fixed point iteration equation

x2 +1 = 2x => x = (x2 +1)/2

xk+1 = (xk2 +1)/2

                             Let x0 = 0

x1 = (02 +1)/2 = ½

x2 = ( (1/2)2 +1 )/2 = 5/8

x3 = ( (5/8)2 +1 )/2 = 89/128

….

x= 1

·        Fixed Point Theorem

o       If | g’(x) | ≤ k < 1, for x in [a,b] and g(x) in [a,b], then the iteration Pn = g(pn-1) will converge to a unique fixed point. In this case P is said to be an attractive fixed point

o       If | g’(x) | > 1, then the iteration Pn = g(pn-1) will not converge

 

 

Convergence

·        First Case : 0 <= g’(p) < 1, Monotone Convergence

 

·        Second Case :  -1 < g’(p) < 0, Oscillating Convergence

What kind of convergence in earlier example?

g(x) = (x2 + 1) /2 => g’(x) = 2x/2 = x

If | g’(x) | <1 for all x in [a,b] and g(x) in [a,b], then g(x) converges

For the earlier example, it is a monotone convergence hence

0 <= g’(x) < 1 => 0 <= x < 1 => Starting point should be between 0 and 1 for convergence

Example: x0 = 0

x1 = (02 +1)/2 = ½

x2 = ( (1/2)2 +1 )/2 = 5/8

….

x= 1

 

However if | g’(x) | > 1 => Divergence

Example: Let x0 = 2,

=> | g’(x0) | = 2 > 1 => divergence

x1 = (22 +1)/2 = 5/2

x2 = ( (5/2)2 +1 )/2 = 29/8

….

x= ∞ => diverges from answer

Actual answer = 1

 

Bisection Method

Similar to Binary Search. We start with two values a,b (an interval) such that a<b and f(a) and f(b) are of opposite signs

That is f(a) <0 and f(b) > 0 or f(a) > 0 and f(b) < 0

This means that there is an x, a < x < b, such that f(x) = 0 (There is a zero between a and b).

Method:

1.     Get middle point of a,b => c = (a+b) /2

2.     If f(a) and f(c) have opposite signs then ‘0’ lies in [a,c]. Then substitute c for b => b ← c

3.     If f(b) and f(c) have opposite signs then ‘0’ lies in [c,b]. Then substitute c for a => a ← c

4.     If f(c) = 0 then c is a zero => stop

 

          Example:

sin(x) = ½ (x is in degrees)

          f(x) = sin(x) – ½ = 0

          Lets start with a = 0, b = 50 => f(0) = -0.5, f(50) = 0.266

          f(a) < 0 and f(b) > 0 so the conditions are fine to begin.

         

     a

b

c

f(a)

f(b)

f(c)

0

50

25

-.5

.266

-.077

25

50

37.5

-.077

0.266

0.1087

25

37.5

31.25

-.077

0.1087

0.018

25

31.25

28.125

-.077

0.018

-0.02

                            

This process continues till a solution is found. It seems c is approaching 30 in this example (solution = 30)

 

Analysis of Algorithm

                   At each iteration the range is reduced by half.

                   |a1 – b1| = |a0 – b0|/2

                   |a2 – b2| = |a0 – b0|/22

=> |an – bn| = |a0 – b0|/2n

If we want an approximation error < E, how many steps do we need?

|an – bn| = |a0 – b0|/2n < E => |a0 – b0|/E < 2n

log(|a0 – b0|/E) < log 2n => log(|a0 – b0|/E) < n*log (2)

log(|a0 – b0|/E) / log (2) < n

 

Example: E = 0.01

log(|0 – 50|/ 0.01) / log (2) < n

(approx) 12 < n

Divergence

·        Monotone divergence – 1 < g’(p)

 

·        Divergent Oscillation – g’(p) < 1

 

 

False Position (Regula-Falsi)

Similar to bisection method. It also needs an interval [a,b] where a zero is found. It converges faster than bisection.

Method:

·        A line is subtended between (a, f(a)), (b, f(b))

·        ‘c’ is the intersection of this line and x axis          

·        We use same rules as bisection

·        If f(a) and f(c) have different signs, then assign b←c

·        If f(b) and f(c) have different signs, then assign a←c

·        To compute c, we do the following

m = (f(a) – f(b)) / (a-b)   (Equation 1)

m = (f(b) – 0) / (b-c)    (Equation 2)

Equating the above two equations, we get

f(b) / (b-c) = (f(a) – f(b)) / (a-b)

b-c = f(b) * (a-b) / (f(a) – f(b))

c = b - f(b) * (a-b) / (f(a) – f(b) )

 

Example:

sin(x) = ½  => f(x) = sin(x) – ½ = 0

Let a = 0 , b = 50

 

     A

b

f(a)

f(b)

C

f(c)

0

50

-.5

0.266

32.637

0.0393

0

32.637

-.5

0.0393

30.25

0.0039

0

30.25

-.5

0.0039

30.0248

0.000375

 

·        ‘c’ seems to  approach 30 as f(x) approaches 0. We stop the iterations when we reach the desired precision

o       Horizontal Convergence

Instead of expecting f(x) = 0, we expect f(x) to be between some small error E such that f(x) < E

Example: if E = 0.001, then we would stop at iteration i=2 (|0.000375| < 0.001)

Geometrically this means that zero is in some horizontal band

o       Vertical Convergence

Stop when x is at a distance S from p (solution). Stop when | x – p | < S

However this is not possible to know since it implies we already know p. This is approximated by determining when | Pn – Pn-1 | < S

o       Both Horizontal and Vertical Convergence

Stop when | Pn – Pn-1 | < E & f(x) < E

Troublesome Functions

          Functions where obtaining the solution of f(x) = 0 is difficult.

If graph y=f(x) is steep near root (p,0) then the root finding is ‘well-conditioned’, ie a solution with good precision is easy to obtain.

If graph y=f(x) is shallow near (p,0) then the root finding is ‘ill-conditioned’ ie the root finding may only have a few significant digits.