Download Floating Point Arithmetic: Error Analysis and MATLAB Function for ex-1-x and more Assignments Mathematical Methods for Numerical Analysis and Optimization in PDF only on Docsity! OSU Fall 2004 mth351 Homework 1 Jason Siefken October 13, 2004 1. Assume decimal base floating point with a 4 digit mantissa and roundoffs implemented by chopping. (a) What is the value of ε? A 4 digit mantissa means x.xxx digits of accuracy. So, the next largest number after 1 is 1.001. 1− 1.001 = ε = .001 (b) Compute x = fl(1/3) and y = fl(1001/3000). What are the relative errors of x and y? The relative error of fl(1/3) = .3333 is 1 3 − 3333 10000 1 3 = 1 10000 . The relative error of fl(1001/3000) = .3336 is 1001 3000 − 3336 10000 1001 3000 = 1 5005 . The relative error in these results is 5 to 10 times smaller than the theo- retical max error (ε). (c) Compute the relative error of z = fl(x− y) and explain why it is bad to subtrack nearly equal numbers. x = 3.333 · 10−1 and y = 3.336 · 10−1 so fl(x− y) = fl(3.336 · 10−1 − 3.333 · 10−1) = 3.000 · 10−4. The relative error is ( 1001 3000 − 1 3 ) − 3 10000 1001 3000 − 1 3 = 1 10 . The relative error in z is 500 to 1000 time greater than the relative error in x or y. This is because, when you subtrack nearly equal numbers, most of the significant digits that you have cancel eachother out, so you are only left with the least significan digits. So, subtraction of nearly equal numbers should be avoided at all costs. 1 2. Write a MATLAB function to evaluate f(x) = e x−1−x x2 for −100 ≤ x ≤ 100 with an error at most 10−12. The only place where the function has the potential for huge error is when there is subtraction of nearly equal numbers. This occurs when x is near 0. In order to solve this problem, we will replace f(x) with a taylor series when x is small. We start out with 16 significant digits, so to figure out where we have two change the function ex − (1 + x) to a Taylor series, we find out when less than the first 4 digits of ex match with less than the first four digits of 1 +x. When x = .15, 1+x = 1.150000000000000·100 and ex ≈ 1.161834242728283· 100. Only the first two digits match, so we will only lose two digits of accuracy when we subtract those numbers. That would leave us with 14 accurate digits, which is more than we need. Similarly, for x = −.15, 1 + x = 7.500000000000000 · 10−1 and ex ≈ 8.607079764250578 · 10−1. No digits are equal, so it is safe to subtract those numbers and not lose much accurasy. Constructing a Taylor series for ex − 1− x, we get ex−1−x ≈ x 2 2 + x3 6 + x4 24 + x5 120 + x6 720 + x7 5040 + x8 40320 + x9 362880 + x10 3628800 . Therefore, a good approximation around 0 for e x−1−x x2 is ex − 1− x x2 ≈ 1 2 + x 6 + x2 24 + x3 120 + x4 720 + x5 5040 + x6 40320 + x7 362880 + x8 3628800 . The error of this function, evaluated at x is Error ≤ eCx x 9 39916800 for Cx ∈ [−.15, .15]. eCx will be The maximum at Cx = .15, so e .15x9/39916800 ≈ 2.91 · 10−8 · x9. |x9| will be max when x = ±.15. So, |Error| ≤ |2.91 · 10−8 · (±.15)9| ≈ 1.12 · 10−15. The error for our Taylor series is well below the the max allowed error of 10−12. Now, we must make sure the error of our function, evaluated nor- mally, is less than 10−12. 2