Possible Problems for Exams

Chapter ONE

  1. Construct the floating point system where the numbers

    $\displaystyle 1000,\frac{1}{2}, \frac{1}{3},\frac{1}{4},\frac{1/5},\frac{1/6},\frac{1}{2048}$

    may be exactly representable.
  2. Consider the floating point system with $N=5$ digits with base $\beta=2$ and lower and upper values of exponents being $U=5$ and $L=-5$. Subnormals are allowed.

  3. Let $x$ and $y$ be two adjacent positive Normalized Single Precision floating point numbers.

  4. IEEE Single Precision has $\beta=2$, $p=24$, $L=-126$, $U=127$, and subnormals are allowed. What is the maximum integer value of $k$ for which the number

    $\displaystyle x = 2^k+2^{-k}$

    can be exactly re presentable in Single Precision?
  5. IEEE Single Precision has $\beta=2$, $p=24$, $L=-126$, $U=127$, and subnormals are allowed.

    In class we have shown that if $x$ is a real number, and $\bar x$ is its floating point representation, then the maximum possible relative error in representing this number is given by

    $\displaystyle \vert\frac{x-\bar{x}}{x}\vert\le\frac{\epsilon}{2},$ (1)

    where $\epsilon$ is machine precision, $\epsilon = \beta^{1-N}.$

    How the equation (1) changes for the subnormals?

    To answer this question,

    1. First, consider $y = 2^{-128}$. How is $y$ going to be represented in IEEE SP?
    2. What is the floating point number that follows number $y$ on the computer number line?
    3. What is the maximum possible absolute error in representing numbers between $y$ and $2 y$?
    4. Consider number $z$ such that

      $\displaystyle y<z<2y.$

      What is the maximum possible relative error in representing the number $z$?
    5. extra credit Generalize the result of the previous question to any number between $2^{-149}<x<2^{-126}$. You can guess the correct result by contemplating equation (1).
  6. In single precision, what is the floating point number that follows “32”? In other words, what is the smallest positive floating point number $x$ such that

    $\displaystyle x>32?$

    Please write the result the way it is stored in the computer, i.e. in binary floating point form.

  7. Consider a floating-point arithmetic with base $2$, precision $8$ and exponent range $[-16,16]$. In other words, each number in this system can be represented as

    $\displaystyle x = \left( d_0 + \frac{d_1}{2}+\frac{d_2}{4}+\frac{d_3}{8}+\frac{...
..._5}{32}+\frac{d_6}{64}
+\frac{d_7}{128}\right)\times 2^{E}, \ \ -16\le E\le 16.$

    Here $E$ is integer, and all $d_i, i =0\dots 7$ are either 0 or $1$.
    1. Show that addition is not necessarily associative. I.e., give an example of 3 numbers $a,$ $b,$ and $c,$ such that $(a+b)+c$ is not equal to $a+(b+c).$
    2. write down two adjacent normalized numbers $x$ and $y$ such that $\vert x-y\vert$ is maximal.
    3. In this floating point system, what is the range of possible relative errors in representing a given number by a machine number?
  8. (a) Find the largest open interval around $x=16$ so that all real numbers from the interval are rounded to $x_f =16$. That is, find the smallest value of $L$ and the largest vlue of $R$ with $L<16<R$ so that any number from the interval $(L,R)$ is rounded to the floating point number $16$. Assume double precision is used (53 binary digits).

    (b)

    Redo the part (a) for the $x=50$, that is, find the interval $(L,R)$ that rounds to the floating point number

    $\displaystyle x_f = 50 = \left(1+\frac{1}{2}+\frac{1}{2^4}\right)\times 2^5.$

  9. IEEE SP has $\beta=2$, $p=24$, $L=-126$, $U=127$.

    Calculate $UFL$, $OFL$ and $\epsilon_{\rm mach}$ for this system. Assume rounding by chopping.

    How many floating point numbers are there between any successive powers of $2$? For example, how many floating point numbers are there between 2 and 4?

  10. Consider the floating point system with

    $\displaystyle \beta=2, p=4, L=-5,U=5.$

    (a) What is the distance from number $8$ to the next largest floating point number in this floating point system?

    (b) What is the distance from number $8$ to the next smallest floating point number in this floating point system?

    (c) What distance is larger, (a) or (b)? Why? What is the relation between these distances and machine precision $\epsilon_{\rm machine}$?

  11. Consider IEEE SP, which has $\beta=2$, $p=24$, $L=-126$, $U=127$. What is the closest floating point number to the number

    $\displaystyle x=75+\frac{1}{3}$

    in IEEE SP?

  12. Consider IEEE SP, which has $\beta=2$, $p=24$, $L=-126$, $U=127$. What is the absolute error in representing the number

    $\displaystyle x=75+\frac{1}{3}$

    in IEEE SP?
  13. Consider IEEE SP, which has $\beta=2$, $p=24$, $L=-126$, $U=127$. What is the closest floating point number to the number

    $\displaystyle x=75+\frac{1}{3}$

    in IEEE SP?

  14. What is the spacing of the floating point numbers between $16$ and $32$, i.e. $16 \le x \le 32$?
  15. Define machine epsilon and explain its significance
  16. What would be the output in MATLAB for the ratio $\epsilon /\epsilon$?
  17. Consider

    $\displaystyle f(x) = (e^x - 1) / x.$

    a) Approximate $e^x - 1$ using third degree Taylor polynomial expanded about $x = 0$. Use this expansion to show that

    $\displaystyle \lim\limits_{x\to 0}f(x) = 1.$

    b) Explain why MATLAB would compute the limit of $f(x)$ to be 0.

  18. What is the largest value of $k$ such that

    $\displaystyle {\rm {float}}(19+2^{-k})> {\rm {float}}(19)$

    in IEEE SP system? Here ${\rm float(x)}$ is a floating point representation of the number $x$.
  19. We have studied in class that the maximum possible relative error for normalized numbers is equal to

    $\displaystyle \vert\frac{x
- {\rm float}(x)}{x}\vert\le\frac{\epsilon_{\rm mach}}{2}.$

    < What is the range of possible errors for the subnormal floating point numbers?

  20. Consider

    $\displaystyle f(x)=\frac{1-(1-x)}{x}$

    function. The following values were obtained in MATLAB:
    $x$ $f(x)$
       
    $\frac{\epsilon}{4}(1+10^{-10})$ $2$
       
    ${\epsilon\over 4}(1+10^{-10})+{\epsilon\over 2}$ $4/3$
       
    ${\epsilon\over 4}(1+10^{-10})+\epsilon$ $6/5$
       
    $-{\epsilon\over 2}(1+10^{-10})-\epsilon$ $4/3$

    Please explain in details the values in the right column of this table.

  21. For which positive integers $k$ can the number $5+2^{-k}$ be represented exactly, with no rounding error in IEEE SP floating point system?

  22. Consider IEEE SP that has binary numbers $\beta=2$, with $N=24$ digits, and lower and upper values of the exponents of $L=-126$, $U=127$. As we discussed in class, the number $\frac{1}{3}$ is not representable in IEEE SP. What is the Floating Point Number that precedes $x=\frac{1}{3}$? In other words, find the largest Floating Point Number that is less then $x=\frac{1}{3}$.

    Hint: your answer should contain 24 binary digits of the mantissa and the value of the exponent.

  23. (a)Which of the following operations of two positive floating point numbers can produce overflow?

    — addition

    — subtraction

    — multiplication

    — division

    If you answered “yes” to any of the questions, please give one example of two Single Precision numbers that produce overflow by given operation.

    (b) Which of the following operations of two positive floating point numbers can produce underflow?

    — addition

    — subtraction

    — multiplication

    — division

    If you answered “yes” to any of the questions, please give one example of two Single Precision numbers that produce underflow by given operation.

  24. As we discussed in class, the number $\frac{1}{3}$ is not representable in IEEE SP. What is the Floating Point Number that precedes $x=\frac{1}{3}$? In other words, find the largest Floating Point Number that is less then $x=\frac{1}{3}$.

    Hint: your answer should contain 24 binary digits of the mantissa and the value of the exponent.

  25. (a)Which of the following operations of two positive floating point numbers can produce overflow?

    — addition

    — subtraction

    — multiplication

    — division

    If you answered “yes” to any of the questions, please give one example of two Single Precision numbers that produce overflow by given operation.

    (b) Which of the following operations of two positive floating point numbers can produce underflow?

    — addition

    — subtraction

    — multiplication

    — division

    If you answered “yes” to any of the questions, please give one example of two Single Precision numbers that produce underflow by given operation.

  26. Consider IEEE SP that has binary numbers $\beta=2$, with $N=24$ digits, and lower and upper values of the exponents of $L=-126$, $U=127$. What is the number that follows the number “zero”, ${\rm zero}\equiv 0$ in IEEE single precision? In other words, find the smallest possible positive number that is representable in this system. Write the result in both binary and decimal format.

  27. Consider the following expression:

    $\displaystyle \frac{1}{1-x}-\frac{1}{1+x},$

    assuming $x\ne\pm 1.$

    (a) for what values of $x$ it is difficult to calculate this expresion accurately in floating point arithmetic?

    (b)Give a rearrangement of the terms such that, for the range of $x$ in part a, the computation is more accurate in floating point system

  28. Consider IEEE SP that has binary numbers $\beta=2$, with $N=24$ digits, and lower and upper values of the exponents of $L=-126$, $U=127$. What is the Floating Point Number that is before $x=0.1$? In other words, what is the largest Floating Point Number that is less then $x=0.1$.

  29. Consider the IEEE floating point system, where the binary numbers $\beta=2$, with $N=24$ digits, and lower and upper values of the exponents of $L=-126$, $U=127$ are used. Also, assume that the “rounding to nearest” rule is used, and if there is a tie, a smallest number is chosen.

    — For what numbers $x$ will the computer claim that inequality $1<x<2$ is true?

    — For what real numbers $x$ will a computer claim that $x=4$?

    — Suppose it is claimed that the solution $x$ of $x^2-2=0$ is exactly representable in this system. Why it is not possible? What is the distance between two floating point numbers that is right above and right below solution of $x^2-2=0$ in this system?

  30. Consider the following toy floating point system: base $\beta=2$, with $N=8$ digits, with lower exponent of $L=-16$ and upper exponent $U=16$.

    Consider the following claim: If the two positive binary floating point numbers $x$ and $y$ in this toy floating point systems are such that

    $\displaystyle \frac{1}{2}\le \frac{x}{y}\le 2,$

    then their difference, $x-y$ is exactly representable number in the floating point system.

    Is this claim true or false? If it is true, explain why, if it is false, find counter example.

  31. Consider the following function:

    $\displaystyle f(x)=\frac{1-(1-x)}{x}.$

    The value of this function for any $x$ is equal to one. However, when we calculate $f(x)$ on the computer for small value of $x$, result is not equal to $1$. Here is a computer generated graph of $f(x)$ for small value of $x$.

    Please explain why this graph looks the way it does.

    In particular, answer the following questions:

    1. Why $f(x)$ is zero for $0<x<0.555 \times 10^{-16}$?
    2. Why $f(x)$ is zero for $-1.1\times 10^{-16} <x<0$ ? Why the value of zero on the left is twice longer than the value of zero on the right?
    3. Why after zero, $f(x)$ jumps to the value of $2$ at $x\simeq 0.5556 \times 10^{-16}$?
    4. Why does $f(x)$ oscillate around $1$ for $0.5556 \times 10^{-16} <x< 5\times 10^{-16}$?
    5. Why does oscillation diminish, as $x$ become larger
    6. Why oscillations are twice as frequent for positive $x$ than for negative $x$?
    7. Explain why does the second jump appear at $x=1.665 \times 10^{-16}$.

  32. Assume a normalized floating point system with base $\beta=10$, three digits of accuracy $d=3$ and the lowest possible exponent of $L=-98$.
    1. What is the smallest possible positive floating point number that is representable in this system (also called “UFL” for underflow level)?
    2. If $x=6.87\times 10^{-97}$ and $y=6.81\times 10^{-97}$, what is the result of computing $x-y$?
    3. If the subnormal numbers were to be allowed, what would be the result of $x-y$?

  33. IEEE SP has $\beta=2$, $p=24$, $L=-126$, $U=127$. In single precision floating point system write down the floating point number that follows the number $x=17$. (In other words, find minimal value of $x>17$, that is exactly representable in this floating point system.

  34. IEEE SP has $\beta=2$, $p=24$, $L=-126$, $U=127$. What is the smallest possible positive integer that is not a single precision number?

  35. In a floating point system with precision $p=6$ decimal digits, $\beta=10$, let $x=1.23456$ and $y=1.23579$.
    1. How many significant digits does the difference $y-x$ contain?
    2. If the floating point system is normalizes, wwhat is hte minimum exponent range for which $x,y$ and $y-x$ are exactly representable?
    3. Is the fifference $y-x$ exctly representable, ragardless of exponent range, if gradual underflow is allowed? Why?

  36. Suppose one calculates using computer arithmetic the following number:

    $\displaystyle A=3*(\frac{4}{3}-1)-1.$

    We have shown in class that one can estimate the machine precision by the number $A$, so that

    $\displaystyle \epsilon_{\rm machine}\simeq A.$

    Determine whether the following examples may be used to determine machine precision:

    $\displaystyle B=(\frac{7}{3} - \frac{4}{3}) - 1,$

    and

    $\displaystyle C= (\frac{4}{3} - \frac{1}{3}) - 1.$

    Explain your reasoning by using the system with two digits of precision.

    You may find it helpful to use calculator to gain intuition for this problem.

  37. Consider a floating-point number system with base $2$, precision $5$ and exponent range $[-10,10].$ I.e., in this system any number can be written as $1.a_{1}a_{2}a_{3}a_{4} \ast 2^{L}.$

    (a) Write down two adjacent normalized numbers $x$ and $y$ such that $\left\vert x-y\right\vert $ is minimal.

    (b) In this floating-point system, what is the maximal possible error of representing $\pi $ by a machine number? What is the possible relative error in representing $\pi $ by a machine number?

    (c) In this floating-point system, how many numbers are there between number $2$ and number $4$.

  38. The following statments are either true or false. If true, provide an explanation why it is true, and if it is false provide an example demonstrating this along with an explanation why it shows the statement is incorrect. In this problem $x$ is a real number and $x_f$ is its floating-point representation in Single Precision with subnormals allowed. Furthermore, $\epsilon$ is machine precision (unit round off).

    1. (a) If $x\ne 0$, then $x_f\ne 0$.
    2. If $x > y$, then $x_f > y_f$ .
    3. If $x_f$ is a floating-point number with $1\le x_f \le 2$, then there is an integer $k$ so that $x_f = 1 + k \epsilon$.
    4. If $x_f$ is a floating-point number, then $\frac{1}{x_f}$ is also floating-point number.
    5. If $x$ is a solution of $a x = b$, then $x_f$ is also a solution.
    6. Approximately 25 percent of the positive floating-point numbers are in the interval $0<x_f<1$.

  39. Find whether the following numbers can be exactly represented in the Single Precision IEEE system. If the number is exactly representable, please write down the exact Floating Point Presentation of a number. If not, please explain why this number can not be represented.

    1. $\displaystyle 15,$

    2. $\displaystyle -6,$

    3. $\displaystyle 7/2,$

    4. $\displaystyle 7/3,$

    5. $\displaystyle \pi,$

  40. What is the largest value of $n$ for which the number

    $\displaystyle 2^n+ 2^{-n}$

    can be exactly represented in the IEEE Single Precision Format?

  41. Please calculate by hand what a computer with single -precision arithmetic with subnormal allowed would produce in these examples.

    Note that the terms must be evaluated in the order indicated by the parenthesis. Make sure to explain your steps.

    1. $\displaystyle (1 -\frac{1}{3}\epsilon) - 1,$

    2. $\displaystyle (1 - 2^{-24}) - 1,$

    3. $\displaystyle (1 + 2^{-25})^4,$

    4. $\displaystyle (1 + \frac{1}{4}\epsilon)(1 + \frac{1}{3}\epsilon),$

    5. $\displaystyle (1 - \frac{\epsilon}{8})/(1 + \frac{ \epsilon}{3})$

      ,
    6. $\displaystyle (1 + (2^{-24} + 2^{-25}))^4 - 1.$

  42. Assume a decimal (base 10) floating point system having machine precision $\epsilon_{mach}=10^{-5}$ and an exponent range of $\pm
20$. What is the result of each of the following floating point arithmetic operations?

    $\displaystyle 1+ 10^{-7}=$

    $\displaystyle 1+10^3=$

    $\displaystyle 1+ 10^7=$

    $\displaystyle 10^{10}+10^3=$

    $\displaystyle 10^{10}/10^{-15}=$

    $\displaystyle 10^{-10}\times 10^{-15}=$

    Chapter TWO

  43. Suppose you are solving $f(x)=0$ by iterations, and you have obtained $(x_{k-1},f(x_{k-1})$ and $(x_k,f(x_k)$. Now use linear interpolation to find $(x_{k+1},f(x_{k+1})$ such that $f(x_{k+1})\simeq 0$. What is the resulting method of solving $f(x)=0$ that you have obtained?

  44. Consider the fixed point iteration scheme

    $\displaystyle x_{k+1} = g(x_k),$ (2)

    where

    $\displaystyle g(x) = a x^3
+ b x^2 + c x + d,$ (3)

    where $a$, $b$. $c$ and $d$ are parameters. NEW

  45. What is the Single Precision Floating Point Number that follows 256 and the Single Precision Floating Point Number that preceeds 256? Write the answer in both binaries and decimals.



Subsections