[Prev][Next][Index][Thread]

Re: Type theory vs floating-point arithmetic



Mathias has raised an interesting point which deserves further emphasis.
For floating point numbers there are two possible interpretations and they
are not always compatible with each other:

The "point interpretation": The floating point numbers represent a finite
set of rational numbers.

The "interval interpretation": The floating point numbers represent a
finite set of intervals which together cover the whole of the real line.

In the point interpretation it is easy to see what the definition of the
various operations should be: You return the floating point number
closest to the exact result. A corresponding formulation for the interval
interpretation (having either the same or a different effect on the actual
implementation) is not so easy.

The interval interpretation, on the other hand, retains some information
about the transition from "real" real numbers to the finite set of
floating point numbers. This is helpful in understanding some aspects of
the behaviour of the latter. For example, we can say that 0 represents the
interval [0,0], whereas +0 represents the interval (0,\epsilon] where
\epsilon is half-way between 0 and the smallest positive number
representable. In this sense, +0 has a very useful role to play: It is a
number which is positive (and not equal to 0) but too small to represent
any digits for it. Likewise, +infinity stands for the interval (M,\infty)
where M is half a unit of precision above the largest representable float.
So it, too, represents perfectly reasonable numbers. NaN, finally,
represents (-\infty,\infty) and should be the result of trying to add
-infinity to +infinity as the answer could be anything. Division by zero
should raise an exception, and not return NaN.

Both interpretations have their difficulties: The point interpretation is
misleading because the number displayed to the user can be different to
the number represented in the machine. So, for example, if you type in
1E40 ("ten to the forty") then many users of calculators and computers
expect this to be one of the set of floating point numbers but because of
the conversion into binary, and because of limited precision, it can not
be represented exactly in either single, double, or extended double
precision.

The interval interpretation is problematic because, as I mentioned at the
beginning, it is not preserved by the operations. In other words, the
interval associated to the result floating point number is not (and does
not in general contain) the set of possible answers from numbers in the
input intervals. Even addition fails this criterion in almost all cases.
The good news, however, is that the interval interpretation can be taken
as the starting point for "exact real computation" but this goes beyond
simple floating point. Look at Martin Escardo's homepage for more pointers
to relevant work, or check out the proceedings of "Computability and
Complexity in Analysis".

The tension between the two interpretations can be illustrated by the
question "What is sin(1E40)?": For the point interpretation, the sine
function is one of the pleasant ones, because the answer is always between
-1 and 1, and so can be computed with high precision. Unfortunately, I
don't know of any arithmetic coprocessor which goes to the trouble of
actually computing the exact result for big input numbers (but please
correct me if I am wrong). Furthermore, given that 1E40 is not exactly
represented (against appearances), it is certainly wrong to return any
result apart from NaN.

In the interval interpretation, the answer should be the interval [-1,1]
because the input interval contains many periods of \pi (even for extended
double precision). Unfortunately, this is not one of the intervals
available in any floating point system and so in this interpretation, too,
the answer should be NaN.

You can have some fun checking various systems for their answers to the
question sin(1E40). The exact answer at the exact mathematical point 1E40
starts with -.5696334. You should also replace 1E40 with
10000000000000000000000000000000000000000 if your systems allows you to do
so. The answer you get may well depend on that.

Achim.

-------------------------------------------------------------------------
 Prof Achim Jung			Tel.: (+44) 121 41 44776
 School of Computer Science		Sec.: (+44) 121 41 43711
 The University of Birmingham		Fax.: (+44) 121 41 44281
 Edgbaston				Email: A.Jung@cs.bham.ac.uk
 BIRMINGHAM, B15 2TT			Web: http://www.cs.bham.ac.uk
 England
-------------------------------------------------------------------------

Follow-Ups: