[Prev][Next][Index][Thread]

Re: Type theory vs floating-point arithmetic



On Saturday, August 23, 2003, at 01:57  AM, Tim Sweeney wrote:

 There is a further problem regarding subtyping.  It might first 
 appear that
 the type of 32-bit floating point values is a subtype of the type of 
 64-bit
 floating point values.  Though one can losslessly convert from 32-bit 
 to
 64-bit and back, performing a 32-bit arithmetic operation on such 
 32-bit
 numbers can produce a different result than performing a 64-bit 
 arithmetic
 operation on their 64-bit extensions, and then converting back.  So 
 from a
 type theoretic point of view, the types of 32-bit and 64-bit floating 
 point
 numbers must be disjoint.

Here is another way of seeing that  32-bit IEEE cannot be a subtype of
64-bit IEEE because

   _THERE ARE NO VALUES THAT THE TWO TYPES HAVE IN COMMON_

Why?  Well, n-bit IEEE numbers are n-bit approximations of 
infinite-precision
reals.  In other words, the number

    SIGN . DIG1 DIG2 ... DIGn E EXP

should be interpreted as

    SIGN . DIG1 DIG2 ... DIGn ? ? ? ? ...  E EXP

where each "?" in the infinite sequence of "?"s stands for "I don't 
know the value
of this digit".  With this, it is clear that +.1011e7 in a 
four-bit-mantissa representation
is different from .10110000e7 in an eight-bit-mantissa representation 
since
the former really is +.1011????????... e7 while the latter is 
+.10110000????... e7.

Matthias