In IEEE single precision, numbers have the form
Step 1: Representation of .
Thus the stored exponent is and the significand has all zero fraction bits.
Step 2: Spacing (ULP) at .
For a normalized number with exponent , the spacing between adjacent floating point numbers is
Step 3: Number that follows .
Binary:
Decimal:
Step 4: Number that precedes .
Binary:
Decimal:
Final answer:
Out of these numbers which ones can be represented exactly in the Single Precision Floating Point system? Justify your answer.
SOLUTION
In IEEE single precision, a real number is represented exactly if and only if
its binary expansion is finite and fits within 24 significant bits
(including the hidden leading ). Equivalently, a rational number is
exactly representable if its denominator (in lowest terms) is a power of
.
Calculate approximately what fraction of Single Precision
Floating Point Numbers lie between 0 and ?
SOLUTION
Single precision IEEE floating point numbers consist of
Step 1: Count numbers between 0 and .
Since
Each normalized exponent contributes positive numbers.
Thus,
(Subnormals contribute only one additional exponent block and do not affect the estimate at leading order.)
Step 2: Total number of floating point numbers.
The total number of single precision floating point numbers is
Step 3: Fraction.
Final answer:
Consider the function
SOLUTION
Newton's method is defined by
For
, we have
Thus the Newton iteration becomes
Analysis: The iteration formula is
Clearly, the sequence does not converge to 0. Instead, the iterates alternate in sign and grow in magnitude:
Hence, Newton's method does not converge to the root for this function.
From the iteration formula
, the magnitude of the iterates is
Thus, unless , the iterates diverge. The only "initial guess" that leads to the root is
The standard Newton convergence theorem states that if is a simple root of
(
,
) and the initial guess is sufficiently close to
, then Newton's method converges quadratically.
Here, is a root, but
The function
has a very steep slope near the origin. Its tangent line at a point
is
Because becomes very large as
, the tangent line is nearly vertical, and the Newton step
A sketch of
and the tangent lines near the origin would show very steep tangents, causing iterates to overshoot and alternate signs.
A criticism of the bisection method is that it does not use the values
of and
, only their signs. A way to use these values
is to find the equation of the straight line connecting
and
and then take
to be the
point where the line intersects the x-axis. Other than this, the
bisection algorithm is unchanged. This is
known as the method of false position.
(a) Show that
(b) To solve
The method of false position (regula falsi) uses the line connecting
and
. The equation of the line is
Set to find the x-intercept
:
Thus,
Method of false position:
Determine the new interval. Since
, set
Next iteration:
Bisection method:
Check the sign:
, so
Comparison:
- Accuracy: Method of false position produces
and
while bisection produces
and
. The false position method gives iterates closer to the root
after the first iteration.
- Flops: Both methods require evaluating
each iteration, but false position requires extra multiplications/divisions to compute
from the weighted formula. Bisection only needs a simple average. Therefore, bisection is slightly cheaper in flops per iteration.
Suppose you came to the Ice Age, and computer in your Time Machine is broken. Suppose to come back to 2026 to complete the numerical computing exam, you need to calculate “three to the power one third”, i.e.
Set up the nonlinear equation
Initial bracket: Since
and
, a suitable initial bracket is
Bisection algorithm: At each step, compute
Number of steps for accuracy :
The bisection error after steps satisfies
Set
Take logarithm base 2:
Newton iteration for
is
Check: Correct iteration formula:
Start with (a reasonable guess).
Convergence: Newton's method is quadratically convergent. For tolerance , typically 4–5 iterations suffice.
Estimation of steps: Roughly, the error decreases as
Consider the function
For the secant, Newton, and bisection methods: Looking at
iterative error when solving , which of these
methods converges to an error of
the fastest?
True or False:
If Newton's method converges to a solution
for a particular choice of
, then it will converge to
for
any starting point between
and
.
To get credit for this
problem, you will need to give comprehensive explanation of your
answer.
For compyting the midpoint of an interval
, which of the following two equations is prefereable in floating point system? Why? When? Devise
the example when the midpoint given be the equation lies outside
of the
interval.
Solve with three digits accuracy an equation
The “divide and average” method for computing a square root of a given number
can be formulated
as follows:
Show that if
Newton method for solving a scalar nonlinear equation
requires computation of the derivative of
at each
iteration. Suppose that we instead replace the true derivative with a
constant value
, that is we use the iteration scheme
(a) Under what condition on value of will this scheme be locally
convergent?
(b) What is the convergence rate of this scheme?
(c) Is there any value of to give a quadratic convergence?
(a) For given values of ,
, and
, will the bisection
method use fewer or more steps to solve
(a) If and
, then what is the maximum number of
steps the bisection method will require to solve
and
have at least six correct binary digits?
(c) Answer part (b) for the equation
A criticism of the bisection method is that it does not use the values
of and
, only their signs. A way to use these values
is to find the equation of the straight line connecting
and
and then take
to be the
point where the line intersects the x-axis. Other than this, the
bisection algorithm is unchanged. This is
known as the method of false position.
(a) Show that
(b) To solve
True or False: If Newton’s method converges into a solution for
a particular choice of
, then it will converge to
for any
starting point between
and
.
If it is true, please explain why, if it is not true, give a counter example.
In class we have shown that Newton method
has quadratic rate of convergence for simple root (i.e. when
Proove, that Newton method has linear rate of convergence
for triple root, i.e. when
.
Extra Credit Proove it for roots of multiplicity ,
i.e. when
In class we have shown that fixed point iterations
To gain an intuition on this problem, consider the fixed point iterations
Consider the function
As you see from the graph, this function has three roots, given by
,
and
.
The following two functions are defined as
| (5) |
We have studied Secant method
Express the Newton iteration method for solving the following
system of nonlinear equations:
Carry out one iteration of Newton's method applied to
the system
| 0 | |||
The following values for the solution of were computed
using Matlab. What method was used (bisection, Newton or secant)? Make sure you explain in details
why it is the method you claim:
X=50.25125628140704
X=25.378140640072242
X=12.944094811287638
X=6.7320923307183715
X=3.636103634563207
X=2.1079101939699143
X=1.3816957571715662
X=1.0826201384421688
X=1.0058580941730362
X=1.0000339198559194
X=1.0000000011504786
X=1.0000000000000000
X=1.0000000000000000
Chapter THREE
Show that for arbitrary square matrices and
the following
is true
Show that for arbitrary square matrices and
the following
is true
Consider the following systems of equations:
Suppose that you use Newton method
(a)
Let be an arbitrary square matrix, and let
be an arbitrary
scalar.
Prove or disprove the following statements:
(i)
(ii)
(b)
Let be an
diagonal matrix with
all its diagonal entries equal to
.
(i) What is the value of
?
(ii) What is the value of
?
Consider the linear system
In this problem
(b) Find a vector such that
Consider the problem with
and
HINT For what values of , if any, the matrix
will be
ill-conditioned?
HINT If
The Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually two integers, is defined as
| (9) |
Consider the matrix
defined as
In other words, is a diagonal matrix with diagonal entries being equal to
For the matrix
Consider the system
where
or
(a) Find explicitly factorization of
.
(b) Use this decomposition to solve (11).
Consider the matrix
In this problem is a small positive number. Sketch the
two lines in the
plane, and describe how they change,
including the point of intersection, as
approaches
zero. Also, calculate the condition number for the matrix, and
describe how it changes, when
approaches zero.
Prove that the one norm is the maximum absolute column sum; use matrices.
Consider
| (11) |
a) Find the LU factorization of the given matrix
b) Calculate the norms ,
, and
for
this matrix
c) Calculate the condition number for the matrix A.
Consider
Use this definition of the matrix norm through the
vector norm to proove that the
norm of a matrix is a
maximum absolute row sum.
Complete the proof for matrices.
b
Extra credit 10% Prove this statement for general matrices.
Consider the system
where
is small. The system has two approximate
solutions
and
Find the norms of the respective
residuals. Which one is smaller? Find the condition number of the
matrix. Explain, why you should not use residuals in this case to
determine the quality of a solution.
Hint: Find an exact solution to (13).
Hint
Calculate an inverse of the following matrix:
| (13) |
| (14) |
Consider the matrix
Consider the following system of equations:
Let a matrix
be factored as
follows:
Chapter FOUR Consider the system of equations
The Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually two integers, is defined as
| (15) |
Consider the matrix
defined as
In other words, is a diagonal matrix with diagonal entries being equal to
Let the matrix
have eigenvalues
, and eigenvectors
. Let the
matrix
have eigenvalues
and same
eigenvectors
. What are eigenvectors and eigenvalues
of the matrix
given by
| (16) |
a) Calculate the Eigenvalues and Eigenvectors for the given matrix
b) What Eigenvalue would the inverse power method converge to? Why?
c) What Eigenvalue would the power method converge to? Why?
d) Define
. For this matrix, what Eigenvalue would the
power method converge to? Why?
Consider the matrix
Suppose that is a symmetric
matrix with
eigenvalues
a) To which of these eigenvalues will the power method converge? Why
b) To which of these eigenvalues will the inverse power iteration converge? Why?
c) To what eigenvalue the power iteration method applied to the matrix
Consider
(a) Find eigenvalues and eigenvectors of .
(b) Find eigenvalues and eigenvectors of
Note that you do not have to calculate explicitly.
Chapter FIVE Given
a) Find the piecewise linear interpolation function
b) find the quadratic interpolation function
Use a Lagrange interpolating polynomial of degree 2 to find an approx- imate value for the following. Not all of the data points are needed, and you should explain which ones you use and why.
(a) if
,
(b) if
,
(c) if
.
Answer:
(a) Lagrange Polynomial is
(b) Lagrange Polynomial is
(c) Lagrange Polynomial is
Consider
for
for
.
Solve for so that the given function is a natural cubic
spline from
.
Use the theorems we studied in class to calculate the maximum error
in interpolating the function by a polynomial of degree four
using five equally spaced points on the interval
.
Suppose you want to interpolate the function
by a
polynomial of degree
by using
equally spaced points on the
interval
. How many points should you use so that the
difference between the sine function and your interpolation is less
that
?
This exercise explores some of the differences between a cubic poly-
nomial and a cubic spline. In this problem the data are:
, and
.
(a) Find the global interpolation polynomial that fits this data, and then evaluate this function at x = 1/2.
(b) Find the natural cubic spline that fits this data, and then evaluate this function at x = 1/2.
(c) The cubic in part (a) satisfies the interpolation and smoothness conditions required of a spline, yet it produces a different result than the cubic spline in part (b). Why?
(d) What boundary conditions should be used so the cubic spline produces the cubic in part (a)?
Solution
5.26
(a)
Global polynomial interpolation is
Writting spline as
Result of part (a) does indeed satisfies interpolation and smoothness condition, but result in part (a) is not a natural spline
(d)To obtain result in (a) one would have to use clamped spline with
Find the clamped cubic spline , which goes through the
points
Find the clamped cubic spline , which goes through the
points
Consider the following piece-wise cubic function
| (17) |
Consider the following piece-wise quadratic polynomial
| (18) |
For a set of given data points
, define the function
In general, is it possible to interpolate data points by a piecewise
quadratic polynomial, with knots at a given data points, such that the
interpolant is
Consider the following data:
(a) Find the global interpolation polynomial that fits these data.
(b) Find the piecewise linear interpolation function that fits these data.
(c) Find the natural cubic spline that fits these data.
answer
(a) By using methods we learned in class, we obtain the polynomial of degree two:
(b) Using Lagrange interpolation for the two subintervals, we obtain
| (19) |
(c) by repeating calculations that we have done in class, we obtain
| (20) |
Here we explore some of the differences between a cubic polynomial and a cubic spline. In this problem the data are:
(a) Find the global interpolation polynomial that
fits this data, and then evaluate this function at .
(b) Find
the natural cubic spline that fits this data, and then evaluate this
function at .
(c) The cubic in part (a) satisfies the interpolation and smoothness conditions required of a spline, yet it produces a different result than the cubic spline in part (b). Why?
(d) What boundary conditions should be used so the cubic spline produces the cubic in part (a)?
Solution:
5.26
(a)
Global polynomial interpolation is
Writting spline as
Result of part (a) does indeed satisfies interpolation and smoothness condition, but result in part (a) is not a natural spline
(d)To obtain result in (a) one would have to use clamped spline with
Consider the following data
Suppose that some measurements had produced the following data:
(i)
Write down second degree polynomial passing through all three points by using Lagrange interpolation
(ii)
Write down second degree polynomial passing through all three points by using Newton interpolation
(iii)
Show that the two polynomials obtained in (i) and (ii) are equivalent
Use appropriate Lagrange interpolating polynomial of to interpolate the following data:
What is the degree of interpolating polynomial? There is a catch in this question.
Consider the following data:
Determine the parabola (interpolating polynomial of
degree two) that interpolates the values of for
Find the clamped cubic spline , which goes through the
points
Find the clamped cubic spline , which goes through the
points
Consider the following function
Consider the logarithmic function evaluated at the points
1,2 and 3:
HINT: not-a-knot spline requires that the third derivative is
continuous at the first and last points.
Do not solve this system.
Suppose you were to define a piece-wise quadratic spline that
interpolates given values
Write down in general form quadratic polynomials that
interpolate these
points, such that the resulting piece wise
quadratic polynomial has continuous first derivative. How many
additional conditions are required to make a square system for the
coefficients of this quadratic spline?
This problem considers the function
| (21) |
This problem considers the function
| (22) |
Suppose you are given 4 point:
a
Suppose that you would like to obtain a quadratic spline that
interpolates the function between the nodes 0,
and
.
b Given a function on a discrete set of data points. Explain the
difference between interpolation and approximation. What Is the difference
between an interpolating polynomial of degree
and an approximating
polynomial of the same degree
?
Is it possible to interpolate three points
Is it possible to interpolate four points
Suppose that you would like to obtain a quadratic spline that
interpolates the function between the nodes
.
Suppose that you are given 4 experimental points:
.
Is it possible to interpolate these 4 data points by piecewise quadratic polynomial with knots at these given data points, such that interpolant is
In each case, if the answer is “yes” explain why, and outline the procedure to find the interpolating function (you may use short form of the equations, do not solve the resulting equations); if the answer is “no”, explain why.
In class we have studied cubic splines, i.e. interpolation by a piece wise cubic polynomial with continious first and second derivative. It is possible to also introduce quadratic spline, i.e. piece wise quadratic polynomial with continious first derivative. Such qudratic spline is the focus of this problem.q
Consider the same data:
Interpolate this data by piece wise quadratic polynomial with continious first and second derivative at a middle point.
Extra Credit, 30 percent Suppose that one additional point is added to the above data, so that there are four points.
Is it possible to interpolate this data by piece wise quadratic interpolation with continious first and second derivatives at the interior points? Is it possible to do it in general?
Chapter SIX
Let be a degree three polynomial, and let
be
its degree two interpolating polynomial at the three points
,
and
. Prove by direct calculation that
In class we have studied quadrature rules of the form
Chose the weights
to maximize the precision of the quadrature. What is the
order of the resulting quadrature?
Derive the error term of this quadrature.
In class we have studied quadrature rules of the form
HINT Solution of this problem will be much easier if you guess
the relationship between ,
,
and
.
variation
Derive the error term of this quadrature.
Consider the qudrature rule of the form
Choose and
to maximize the accuracy of the resulting
quadrature. Calculate the truncation error of the resulting
quadrature.
Is it more accurate or less accurate than Midpoint quadrature?
Given the points
:
a) Evaluate the integral of the function using the midpoint rule
b) Evaluate the integral using Simpson's rule
c) Evaluate the integral using the trapezoid rule
(a) Consider the integration rule of the form
(b) Calculate truncation error of this quadrature.
This problem concerns using numerical methods to calculate the integral
We are going to compare different ways to calculate this integral by
using the value of the function
on only three points,
,
and
.
Let us denote
As you know, we can calculate the value of this integral analytically:
Calculate the value of this integral numerically by using
Compare the result of your calculations with the exact value. Which of these four methods give the most accurate result? Is this consistent with your expectations?
(a) Consider the integration rule of the form
(b) Calculate truncation error of this quadrature.
Find 3-point Gaussian rule for
if the standard 3-point Gaussian rule is given by
(a) (10 points) In the three point quadrature rule
(b) (5 points)What is the degree of the resulting scheme? Demonstrate this by showing the scheme correctly integrates a polynomial of that degree, and that it does not integrate correctly the polynomial of the degree of one order higher.
Find
in
Hint Derivation is similar to calculation of the trapezoid
error term we did in class. You will need to express
via
and its derivatives.
In class we have studied two point Gauss quadrature. In this problem you are to derive one point Gauss quadrature.
Consider the qudrature rule of the form
Choose and
to maximize the order (accuracy) of the resulting
quadrature. What is the truncation order of this quadrature?
Suppose that you have a tabular data, that is to say that the
function is given only on the
equidistant points
,
such that
,
. Propose a way to
numerically evaluate
Chapter SEVEN
Find an approximation of
that utilizes
,
, and
.
You are producing a final project for your Master's
Degree and need to solve Initial Value Problem
numerically. You prefer to have accurate numerical
solution. Out of Forward Euler, Backward Euler,
Trapezoidal, Heun (RK2), and Runge-Kutta (RK4) method,
which method would you choose and why?
Consider RK2 (Heun's) method:
This can be done, for example, by showing that if at step the
numerical solution is
Suppose you are solving Initial Value Problem
Derive an finite difference approximation to
that
uses
,
and
. Calculate the
truncation error term of the resulting finite difference approximation.
variation
,
and
. Calculate the
Find approximation of the first derivative that uses
,
and
. What is the error term of your
approximation?
For the equation
consider the following
numerical method
Consider the following -step method for solving
SOLUTION Method of undetermined coefficients, is satisfied
automatically,
gives
,
gives
.
Solving equations we get
second order accurate
Determine whether the method
Suppose you want to solve numerically
for
using 100 time steps (so,
). The method to be tried
are (i) the Euler method (ii) the backward Euler method (iii) the trapezoidal method (iv) the RK2 (Heun) method (v) the RK4 method.
(a) Which method do you expect to finish the calculation the fastests? Why?
(b) Which would be the second fastest method? Why?
(c) Which would be the most accurate method? Why?
(d) If stability is a concern, which method would be the best? Why?
This is general question. It is sufficient to give an answer with out proof.
Consider the following initial value problem
The following algorithm is proposed for its numerical solution:
Consider the following initial value problem:
Using the Euler method with
calculate
and
10 pt.
Is the proposed method numerically stable for the propsoed time step? 2 pt
Extra credit - 2 pt compare the result with the exact analytical solution.
computer the solution of
Please answer the following questions:
Suppose you want to solve numerically
for
using 100 time steps (so,
). The method to be tried
are (i) the Euler method (ii) the backward Euler method (iii) the trapezoidal method (iv) the RK2 (Heun) method (v) the RK4 method.
(a) Which method do you expect to finish the calculation the fastests? Why?
(b) Which would be the second fastest method? Why?
(c) Which would be the most accurate method? Why?
(d) If stability is a concern, which method would be the best? Why?
IsHeun's method
Consider RK2 (Heun's) method:
This can be done, for example, by showing that if at step the
numerical solution is
Consider the following initial value problem:
( 4 points) Compare the result with the exact analytical solution.
State whether the following methods are (i) explicit or implicit (ii) single step or multi-step (iii) selfstarting or not. Cross out wrong statements and underline correct ones:
(i)explicit/implicit (ii)single step/multi-step (iii)selfstarting/not selfstarting.
(i)explicit/implicit (ii)single step/multi-step (iii)selfstarting/not selfstarting.
(i)explicit/implicit (ii)single step/multi-step (iii)selfstarting/not selfstarting.
The Bernoulli equation is
Consider the equation
Consider the fourth order RK method:
Apply this RK4 method for solving the equation
Consider the fourth order RK method:
Apply this RK4 method for solving the equation
Chapter EIGHT
Set up the linear least squares problem for
fitting the model
to the
four data points
,
,
,
.
Set up and solve the linear least squares system
(a) Suppose you would like to fit the data points
(b) Also calculate the residual.
Suppose you measure as a function of
and you get the following:
| 0 | 2 |
| 1 | 1 |
| 2 | -2 |
| 3 | -1 |
Suppose you would like to fit this data by
Consider the data
Apporximate this data by a constant, i.e. find such that
is a good fit for this data.
a Calculate the value of which minimized the
square of the second norm
of the residual
b Calculate and
. Sketch
,
,
and
. Verify that
. Is it so on your graph?
c Extra-credit, 5 points Obtain equation for that minimizes
instead of
traditional
. Do not solve the resulting equation.
What are the advantages and disadvantages of using
instead of
?
Suppose that an experiment produced the following data:
a Calculate the value of which minimized the
square of the second norm
of the residual
b Calculate and
. Sketch
,
,
and
. Verify that
. Is it so on your graph?
c Extra-credit, 5 points Obtain equation for that minimizes
instead of
traditional
. Do not solve the resulting equation.
What are the advantages and disadvantages of using
instead of
?
Find the least squares solution of
Let A be the identity matrix
| (36) |