Catastrophic cancellation

Loss of precision in numerical analysis

In numerical analysis, catastrophic cancellation[1][2] is the phenomenon that subtracting good approximations to two nearby numbers may yield a very bad approximation to the difference of the original numbers.

For example, if there are two studs, one L 1 = 253.51 cm {\displaystyle L_{1}=253.51\,{\text{cm}}} long and the other L 2 = 252.49 cm {\displaystyle L_{2}=252.49\,{\text{cm}}} long, and they are measured with a ruler that is good only to the centimeter, then the approximations could come out to be L ~ 1 = 254 cm {\displaystyle {\tilde {L}}_{1}=254\,{\text{cm}}} and L ~ 2 = 252 cm {\displaystyle {\tilde {L}}_{2}=252\,{\text{cm}}} . These may be good approximations, in relative error, to the true lengths: the approximations are in error by less than 2% of the true lengths, | L 1 L ~ 1 | / | L 1 | < 2 % {\displaystyle |L_{1}-{\tilde {L}}_{1}|/|L_{1}|<2\%} .

However, if the approximate lengths are subtracted, the difference will be L ~ 1 L ~ 2 = 254 cm 252 cm = 2 cm {\displaystyle {\tilde {L}}_{1}-{\tilde {L}}_{2}=254\,{\text{cm}}-252\,{\text{cm}}=2\,{\text{cm}}} , even though the true difference between the lengths is L 1 L 2 = 253.51 cm 252.49 cm = 1.02 cm {\displaystyle L_{1}-L_{2}=253.51\,{\text{cm}}-252.49\,{\text{cm}}=1.02\,{\text{cm}}} . The difference of the approximations, 2 cm {\displaystyle 2\,{\text{cm}}} , is in error by almost 100% of the magnitude of the difference of the true values, 1.02 cm {\displaystyle 1.02\,{\text{cm}}} .

Catastrophic cancellation is not affected by how large the inputs are—it applies just as much to large and small inputs. It depends only on how large the difference is, and on the error of the inputs. Exactly the same error would arise by subtracting 52 cm {\displaystyle 52\,{\text{cm}}} from 54 cm {\displaystyle 54\,{\text{cm}}} as approximations to 52.49 cm {\displaystyle 52.49\,{\text{cm}}} and 53.51 cm {\displaystyle 53.51\,{\text{cm}}} , or by subtracting 2.00052 km {\displaystyle 2.00052\,{\text{km}}} from 2.00054 km {\displaystyle 2.00054\,{\text{km}}} as approximations to 2.0005249 km {\displaystyle 2.0005249\,{\text{km}}} and 2.0005351 km {\displaystyle 2.0005351\,{\text{km}}} .

Catastrophic cancellation may happen even if the difference is computed exactly, as in the example above—it is not a property of any particular kind of arithmetic like floating-point arithmetic; rather, it is inherent to subtraction, when the inputs are approximations themselves. Indeed, in floating-point arithmetic, when the inputs are close enough, the floating-point difference is computed exactly, by the Sterbenz lemma—there is no rounding error introduced by the floating-point subtraction operation.

Formal analysis

Formally, catastrophic cancellation happens because subtraction is ill-conditioned at nearby inputs: even if approximations x ~ = x ( 1 + δ x ) {\displaystyle {\tilde {x}}=x(1+\delta _{x})} and y ~ = y ( 1 + δ y ) {\displaystyle {\tilde {y}}=y(1+\delta _{y})} have small relative errors | δ x | = | x x ~ | / | x | {\displaystyle |\delta _{x}|=|x-{\tilde {x}}|/|x|} and | δ y | = | y y ~ | / | y | {\displaystyle |\delta _{y}|=|y-{\tilde {y}}|/|y|} from true values x {\displaystyle x} and y {\displaystyle y} , respectively, the relative error of the difference x ~ y ~ {\displaystyle {\tilde {x}}-{\tilde {y}}} of the approximations from the difference x y {\displaystyle x-y} of the true values is inversely proportional to the difference of the true values:

x ~ y ~ = x ( 1 + δ x ) y ( 1 + δ y ) = x y + x δ x y δ y = x y + ( x y ) x δ x y δ y x y = ( x y ) ( 1 + x δ x y δ y x y ) . {\displaystyle {\begin{aligned}{\tilde {x}}-{\tilde {y}}&=x(1+\delta _{x})-y(1+\delta _{y})=x-y+x\delta _{x}-y\delta _{y}\\&=x-y+(x-y){\frac {x\delta _{x}-y\delta _{y}}{x-y}}\\&=(x-y){\biggr (}1+{\frac {x\delta _{x}-y\delta _{y}}{x-y}}{\biggr )}.\end{aligned}}}

Thus, the relative error of the exact difference x ~ y ~ {\displaystyle {\tilde {x}}-{\tilde {y}}} of the approximations from the difference x y {\displaystyle x-y} of the true values is

| x δ x y δ y x y | . {\displaystyle \left|{\frac {x\delta _{x}-y\delta _{y}}{x-y}}\right|.}

which can be arbitrarily large if the true values x {\displaystyle x} and y {\displaystyle y} are close.

In numerical algorithms

Subtracting nearby numbers in floating-point arithmetic does not always cause catastrophic cancellation, or even any error—by the Sterbenz lemma, if the numbers are close enough the floating-point difference is exact. But cancellation may amplify errors in the inputs that arose from rounding in other floating-point arithmetic.

Example: Difference of squares

Given numbers x {\displaystyle x} and y {\displaystyle y} , the naive attempt to compute the mathematical function x 2 y 2 {\displaystyle x^{2}-y^{2}} by the floating-point arithmetic fl ( fl ( x 2 ) fl ( y 2 ) ) {\displaystyle \operatorname {fl} (\operatorname {fl} (x^{2})-\operatorname {fl} (y^{2}))} is subject to catastrophic cancellation when x {\displaystyle x} and y {\displaystyle y} are close in magnitude, because the subtraction can expose the rounding errors in the squaring. The alternative factoring ( x + y ) ( x y ) {\displaystyle (x+y)(x-y)} , evaluated by the floating-point arithmetic fl ( fl ( x + y ) fl ( x y ) ) {\displaystyle \operatorname {fl} (\operatorname {fl} (x+y)\cdot \operatorname {fl} (x-y))} , avoids catastrophic cancellation because it avoids introducing rounding error leading into the subtraction.[2]

For example, if x = 1 + 2 29 1.0000000018626451 {\displaystyle x=1+2^{-29}\approx 1.0000000018626451} and y = 1 + 2 30 1.0000000009313226 {\displaystyle y=1+2^{-30}\approx 1.0000000009313226} , then the true value of the difference x 2 y 2 {\displaystyle x^{2}-y^{2}} is 2 29 ( 1 + 2 30 + 2 31 ) 1.8626451518330422 × 10 9 {\displaystyle 2^{-29}\cdot (1+2^{-30}+2^{-31})\approx 1.8626451518330422\times 10^{-9}} . In IEEE 754 binary64 arithmetic, evaluating the alternative factoring ( x + y ) ( x y ) {\displaystyle (x+y)(x-y)} gives the correct result exactly (with no rounding), but evaluating the naive expression x 2 y 2 {\displaystyle x^{2}-y^{2}} gives the floating-point number 2 29 = 1.8626451 4923095703125 _ × 10 9 {\displaystyle 2^{-29}=1.8626451{\underline {4923095703125}}\times 10^{-9}} , of which less than half the digits are correct and the other (underlined) digits reflect the missing terms 2 59 + 2 60 {\displaystyle 2^{-59}+2^{-60}} , lost due to rounding when calculating the intermediate squared values.

Example: Complex arcsine

When computing the complex arcsine function, one may be tempted to use the logarithmic formula directly:

arcsin ( z ) = i log ( 1 z 2 i z ) . {\displaystyle \arcsin(z)=i\log {\bigl (}{\sqrt {1-z^{2}}}-iz{\bigr )}.}

However, suppose z = i y {\displaystyle z=iy} for y 0 {\displaystyle y\ll 0} . Then 1 z 2 y {\displaystyle {\sqrt {1-z^{2}}}\approx -y} and i z = y {\displaystyle iz=-y} ; call the difference between them ε {\displaystyle \varepsilon } —a very small difference, nearly zero. If 1 z 2 {\displaystyle {\sqrt {1-z^{2}}}} is evaluated in floating-point arithmetic giving

fl ( fl ( 1 fl ( z 2 ) ) ) = 1 z 2 ( 1 + δ ) {\displaystyle \operatorname {fl} {\Bigl (}{\sqrt {\operatorname {fl} (1-\operatorname {fl} (z^{2}))}}{\Bigr )}={\sqrt {1-z^{2}}}(1+\delta )}

with any error δ 0 {\displaystyle \delta \neq 0} , where fl ( ) {\displaystyle \operatorname {fl} (\cdots )} denotes floating-point rounding, then computing the difference

1 z 2 ( 1 + δ ) i z {\displaystyle {\sqrt {1-z^{2}}}(1+\delta )-iz}

of two nearby numbers, both very close to y {\displaystyle -y} , may amplify the error δ {\displaystyle \delta } in one input by a factor of 1 / ε {\displaystyle 1/\varepsilon } —a very large factor because ε {\displaystyle \varepsilon } was nearly zero. For instance, if z = 1234567 i {\displaystyle z=-1234567i} , the true value of arcsin ( z ) {\displaystyle \arcsin(z)} is approximately 14.71937803983977 i {\displaystyle -14.71937803983977i} , but using the naive logarithmic formula in IEEE 754 binary64 arithmetic may give 14.719 644263563968 _ i {\displaystyle -14.719{\underline {644263563968}}i} , with only five out of sixteen digits correct and the remainder (underlined) all incorrect.

In the case of z = i y {\displaystyle z=iy} for y < 0 {\displaystyle y<0} , using the identity arcsin ( z ) = arcsin ( z ) {\displaystyle \arcsin(z)=-\arcsin(-z)} avoids cancellation because 1 ( z ) 2 = 1 z 2 y {\textstyle {\sqrt {1-(-z)^{2}}}={\sqrt {1-z^{2}}}\approx -y} but i ( z ) = i z = y {\displaystyle i(-z)=-iz=y} , so the subtraction is effectively addition with the same sign which does not cancel.

Example: Radix conversion

Numerical constants in software programs are often written in decimal, such as in the C fragment double x = 1.000000000000001; to declare and initialize an IEEE 754 binary64 variable named x. However, 1.000000000000001 {\displaystyle 1.000000000000001} is not a binary64 floating-point number; the nearest one, which x will be initialized to in this fragment, is 1.0000000000000011102230246251565404236316680908203125 = 1 + 5 2 52 {\displaystyle 1.0000000000000011102230246251565404236316680908203125=1+5\cdot 2^{-52}} . Although the radix conversion from decimal floating-point to binary floating-point only incurs a small relative error, catastrophic cancellation may amplify it into a much larger one:

double x = 1.000000000000001;  // rounded to 1 + 5*2^{-52}
double y = 1.000000000000002;  // rounded to 1 + 9*2^{-52}
double z = y - x;              // difference is exactly 4*2^{-52}

The difference 1.000000000000002 1.000000000000001 {\displaystyle 1.000000000000002-1.000000000000001} is 0.000000000000001 = 1.0 × 10 15 {\displaystyle 0.000000000000001=1.0\times 10^{-15}} . The relative errors of x from 1.000000000000001 {\displaystyle 1.000000000000001} and of y from 1.000000000000002 {\displaystyle 1.000000000000002} are both below 10 15 = 0.0000000000001 % {\displaystyle 10^{-15}=0.0000000000001\%} , and the floating-point subtraction y - x is computed exactly by the Sterbenz lemma.

But even though the inputs are good approximations, and even though the subtraction is computed exactly, the difference of the approximations y ~ x ~ = ( 1 + 9 2 52 ) ( 1 + 5 2 52 ) = 4 2 52 8.88 × 10 16 {\displaystyle {\tilde {y}}-{\tilde {x}}=(1+9\cdot 2^{-52})-(1+5\cdot 2^{-52})=4\cdot 2^{-52}\approx 8.88\times 10^{-16}} has a relative error of over 11 % {\displaystyle 11\%} from the difference 1.0 × 10 15 {\displaystyle 1.0\times 10^{-15}} of the original values as written in decimal: catastrophic cancellation amplified a tiny error in radix conversion into a large error in the output.

Benign cancellation

Cancellation is sometimes useful and desirable in numerical algorithms. For example, the 2Sum and Fast2Sum algorithms both rely on such cancellation after a rounding error in order to exactly compute what the error was in a floating-point addition operation as a floating-point number itself.

The function log ( 1 + x ) {\displaystyle \log(1+x)} , if evaluated naively at points 0 < x 1 {\displaystyle 0<x\lll 1} , will lose most of the digits of x {\displaystyle x} in rounding fl ( 1 + x ) {\displaystyle \operatorname {fl} (1+x)} . However, the function log ( 1 + x ) {\displaystyle \log(1+x)} itself is well-conditioned at inputs near 0 {\displaystyle 0} . Rewriting it as

log ( 1 + x ) = x log ( 1 + x ) ( 1 + x ) 1 {\displaystyle \log(1+x)=x{\frac {\log(1+x)}{(1+x)-1}}}
exploits cancellation in x ^ := fl ( 1 + x ) 1 {\displaystyle {\hat {x}}:=\operatorname {fl} (1+x)-1} to avoid the error from log ( 1 + x ) {\displaystyle \log(1+x)} evaluated directly.[2] This works because the cancellation in the numerator log ( fl ( 1 + x ) ) = x ^ + O ( x ^ 2 ) {\displaystyle \log(\operatorname {fl} (1+x))={\hat {x}}+O({\hat {x}}^{2})} and the cancellation in the denominator x ^ = fl ( 1 + x ) 1 {\displaystyle {\hat {x}}=\operatorname {fl} (1+x)-1} counteract each other; the function μ ( ξ ) = log ( 1 + ξ ) / ξ {\displaystyle \mu (\xi )=\log(1+\xi )/\xi } is well-enough conditioned near zero that μ ( x ^ ) {\displaystyle \mu ({\hat {x}})} gives a good approximation to μ ( x ) {\displaystyle \mu (x)} , and thus x μ ( x ^ ) {\displaystyle x\cdot \mu ({\hat {x}})} gives a good approximation to x μ ( x ) = log ( 1 + x ) {\displaystyle x\cdot \mu (x)=\log(1+x)} .

References

  1. ^ Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge (2018). Handbook of Floating-Point Arithmetic (2nd ed.). Gewerbestrasse 11, 6330 Cham, Switzerland: Birkhäuser. p. 102. doi:10.1007/978-3-319-76526-6. ISBN 978-3-319-76525-9.{{cite book}}: CS1 maint: location (link)
  2. ^ a b c Goldberg, David (March 1991). "What every computer scientist should know about floating-point arithmetic". ACM Computing Surveys. 23 (1). New York, NY, United States: Association for Computing Machinery: 5–48. doi:10.1145/103162.103163. ISSN 0360-0300. S2CID 222008826. Retrieved 2020-09-17.