HomeWork ONE

  1. 1.8

    a $\epsilon_{\rm machine}\times \beta^{E_{min}}$, where machine epsilon $\epsilon_{\rm machine}=1/beta^{p-1}$, $\beta$ is the base of the system and $E_{min}$ is lowest possible value of exponent.

    b $\epsilon_{\rm machine}\times \beta^{E_{max}}$ with $E_{max}$ being maximum value of exponent,


    $\displaystyle 2^{53-24}-1=536870911.$

  2. 1.8


    $\displaystyle \sum\limits_{k=0}^{1000}\frac{e^k}{e^k+1} =
\sum\limits_{k=0}^{1000}\frac{1}{e^{-k+1}} =1000.03583648423874/$

    Here we divided numerator and denominator by $e^k$ to avoid overflow.


    $\displaystyle \sum\limits_{k=0}^{1000}\frac{\cosh k}{1+\sinh k} =
\sum\limits_{k=0}^{1000}\frac{1+e^{-2k}}{1+2 e^{-k}-e^{-2 k}}
= 1000.38056571994274.$

    Here we divided numerator and denominator by $e^k$ to avoid overflow.


    $\displaystyle \sum\limits_{k=0}^{1000}\sqrt{3+e^k} -


    $\displaystyle \frac{\sum\limits_{k=0}^{1000}e^k}{\sum\limits_{n=0}^{1000} n e^n...
...00}e^{k-1000}}{\sum\limits_{n=0}^{1000} n e^{n-1000}} = 0.00100058231560098515.$

    Here we divided numerator and denominator by $e^{1000}$ to avoid overflow.


    $\displaystyle \sum\limits_{k=1}^{1000}\ln(1+e^k) =
...+e^{-k})) =
\sum\limits_{k=1}^{1000}(1+\ln(1+e^{-k}) =

     x = [1:1000]; y = (1+log(1+exp(-x))); sum(y)


    $\displaystyle \log\sum\limits_{k=1}^{1000} e^k=
\log\sum\limits_{k=1}^{1000}e ^...
...imits_{k=1}^{1000}e^{k-1000} =
1000+ \log \sum\limits_{k=1}^{1000} e^{k-1000} =$      
    $\displaystyle 1.001581976706869e+03$     (37)



    $\displaystyle X=\sum\limits_{k=1}^{1000}k\left[ \sin\left(\pi(k^{10}+1/k)\right)-

    Since $\sin(\pi+x) = -\sin(x)$, and $\sin[\pi-x]=\sin[x]$, and $\sin[2 \pi+x] = \sin[x]$, and using the fact that even number raised to the power of 10 is even, and odd number raised to teh power of 10 is odd, we obtain

    $\displaystyle X=2 \sum\limits_{k=1}^{1000}(-1)^k k \sin\left(\frac{\pi}{k}\right)

  3. 1.20 Homer states

    $\displaystyle 3987^{12}+4365^{12}=4472^{12}.$

    Is it true?

    a If Horner right,

    $\displaystyle X_1= 3987^{12}+4365^{12}-4472^{12}$

    should be 0, yet Matlab returns $1.2119e+33$

    b If Horner is right,

    $\displaystyle X_2=\left(3987^{12}+4365^{12}\right)^{\frac{1}{12}}-4472$

    should be zero, yet Matlab returns $7.0577e-09$.

    c If Horner is right,

    $\displaystyle X_3=\frac{\left(3987^{12}+4365^{12}\right)}{4472^{12}}$

    should be $1$, yet Matlab returns $1.000000000018943$

    d If Horner is right,

    $\displaystyle X_4=\left[\left(3987^{12}+4365^{12}\right)
^{\frac{1}{12}}\right]^{12} -

    should be zero, yet Matlab returns $1.211616482290016e+33$

    e The product of two $p$ digit numbers may have up to $2 p$ digits. Therefore $3987^{12}$ may have up to 48 digits, which is longer then Matlab's 16 digits. Therefore we can not use doubles in Matlab to either prove or disprove this conjecture.

    Note that $X_1$ can be explicitly calculated on a computer with inifinite precision (check out vpa command in matlab), because it is an integer. Using the “sym” command, or Mathematica allows us to get the answer:

    $\displaystyle X_1=1211886809373872630985912112862690.$

  4. 1.24 Here we deal with

    $\displaystyle f(x)=\frac{e^x-1}{x},$

    for small values of $x$. Expanding $e^x\simeq 1+x,$ we obtain

    $\displaystyle f(x)\simeq \frac{(1+x)-1}{x}.$

    This is very close to the example we studied in class, so all information can be taken directly from lecture notes

  5. 1.27

    (a) For some large integer value of $k$ the increment $1/k^2$ will be too small to change the sum. In other words the increment will be less than machine precision over two. At that time values of $s$ and $ss$ will be the same, and the sum will stop changing.


    s=1;ss=0;k=1;while s~=ss k=k+1;ss=s;s=s+1/k^2;end;s
    returns the value of

    $\displaystyle 1.644934057834575.$

    This is 16 digits. To round it to 12 digits, use round(ans,12), which returns

    $\displaystyle 1.644934057835000,$

    which is the 12 digits of the answer.

    Since the answer is between $1$ and $2$, the floating point numbers are machine epsilon apart. Therefore the algorith stops when $1/k^2 = \epsilon/2$, or at the integer that is closest to $k=\sqrt{2/\epsilon}$.

    Comparing the result of the code to $\pi^2/6$ we see the error of about $9e-9$, so there are correct 8 digits.

    (c) In the previous example we have seen that making calculations that are accurate to 16 digits, we get an answer that is correct to 8 digits.

    One way to increase accuracy is to use the “vpa” command as follows:

    s=1;ss=0;k=1;while s~=ss k=k+1;ss=s;s=vpa(s+1/k^2,20);end;s

    Alternatively, here is one more idea to check out. If the result is correct to $m$ digits, it means that the effective epsilon of the result is $2^{1-m}$. Therefore the computer has obtained the correct result for

    $\displaystyle k=\sqrt{2/2^{1-m}} = 1/2^{m/2}.$ (38)

    We than may compute a separate sum for $k$ between the value in (40) and the new value of

    $\displaystyle k=\sqrt{2/2^{1-\tilde m}} = 1/2^{\tilde{m}/2}.$ (39)

  6. (a)

    The number of flops to compute $x^n = x*x*\dots x$ is $n$. The proposed algorithm will use $1+n/2$ flops for even $n$ and $2+(n-1)/2$ for odd $n$.


    Since computer operates in binaries, the proposed formula for $x^{28}$ takes advantage of binary representation. The trick is to decompose the $28$ in the sum of powers of two, and calculate each power of two by using the shortcut of (a), and then add the result.

    The proposed algorithm will use 11 flops instead of 28.


    $\displaystyle x^{100} = ((((((x^2)^2)^2)^2)^2)^2)* (((((x^2)^2)^2)^2)^2)* (x^2)^2.$

    Number of flops here is 14. Since 100 is even, the algorithm in (a) will use $51$ flop.


    Using the Table 1.3 we compute the total flops to be $12+1+14=27$. This number is $n$ independent, so it works faster for large $n$ and significantly faster for very large $n$.

Old Problems


Use Taylor’s theorem to show that

$\displaystyle \vert { \rm sin} x-{\rm sin} x_f \vert \le \vert x-x_f \vert$ (40)

Denoting $x=x_f+\delta$ and using

$\displaystyle \sin(x=x_f+\delta) \simeq \sin(x)+\delta\cos(x_f),$

we obtain

$\displaystyle \sin(x)-\sin(x_f)\simeq \delta\cos(x_f).$

Since $\cos(x_f)\le 1$, we obtain (41).

b Why

$\displaystyle \vert x-x_f\vert\le \vert\tilde{ x_f} - \tilde{ \tilde{ x_f}}\vert$ (41)

This is due to “rounding to nearest”, $x$ is rounded to $x_f$, and $x_f$ is either $\tilde{ x_f}$ or $\tilde{ \tilde{ x_f}}$. In other words, the distance between a number and its floating point representation is less than half a distance between nearest two floaing point numbers.


Since $2^E<x<2^{E+1}$ and $2^E<x_f<2^{E+1}$, we get $\vert x-x_f\vert<2^E$. More precisely, $x-x_f\le \epsilon
e^E$. Use this together with (41) and (42) to get

$\displaystyle \vert\sin x-\sin
x_f\vert\le \epsilon 2^{E-1}$ (42)

d This question implies that

$\displaystyle x<L\le 2^{E+1}.$     (43)

If this property is not satisfied, then the result can not be obtained.

To obtain this result, write

$\displaystyle \epsilon 2^{E-1}=\frac{1}{4}\epsilon 2^{E+1},$

then use (44).

Author's note: in (d) the right hand side shoule be $eps*L/2$ and in (e) the right hand side should be $eps*(L+2)/2$.

e The RHS can be rewritten as

$\displaystyle \frac{1}{4}\epsilon(L+4) = \frac{\epsilon L}{4}+\epsilon.$

Now use $\vert sin x_f-s_f\vert\le \epsilon$ to obtain the desired result.

f Using $\epsilon=10^{-16}$ we get $L=4E8$. Picture shows that $10^{-8}$ accuracy is obtained at $k=25$, i.e. $L\simeq 3E7$. In other words the estimate is optimistic.