Transparent and opaque interpretations of datatypes

A number of us on the TIL project recently were surprised to discover that
the opaque (or generative) interpretation of datatypes in ML types some
programs that do not type under the transparent interpretation.  This has
significant implications to type-preserving compilation of languages with
opaque datatypes, such as Standard ML.  This note presents the
counter-example and discusses some of those implications.

Karl Crary
Carnegie Mellon University

	    Transparent and Opaque Interpretations of Datatypes

Karl Crary    Robert Harper    Perry Cheng    Leaf Petersen    Chris Stone

			     November 13, 1998

Standard ML employs an opaque (or "generative") interpretation of datatype
specifications, in which every datatype specification provides a new,
abstract type that is different from any other type, including other
identically specified datatypes.  An alternative interpretation is the
transparent one, in which a datatype specification exposes the underlying
recursive type implementation of the datatype.

It is commonly believed that the transparent interpretation is strictly
more permissive than the opaque interpretation; that all programs typable
under the opaque discipline are also typable under the transparent
discipline.  The purpose of this note is to illustrate that this common
belief is incorrect (in the usual equational theory for types), and to
discuss some of the implications of that fact.


To see the issue involved, consider the signatures SIG1 and SIG2:

     signature SIG1 =
           datatype u = C of u * u | D of int
           type t = u * u

     signature SIG2 =
           type t
           datatype u = C of t | D of int

Is SIG1 a subsignature of SIG2?  In an opaque interpretation (and in
Standard ML) the answer is yes.  But in a transparent interpretation the
answer is no.  To show why this is so, we give the opaque and transparent
interpretations of SIG1 and SIG2 in a type theory without datatypes but
with sums and iso-recursive types (recursive types in which fold and unfold
must be mediated by an explicit isomorphism).

In an opaque interpretation, a datatype specification provides an abstract
type along with introduction and elimination functions for that type [1]:

     signature SIG1_opaque =
           type u
           type t = u * u
           val u_in  : (u * u + int) -> u
           val u_out : u -> (u * u + int)

     signature SIG2_opaque =
           type t
           type u
           val u_in  : (t + int) -> u
           val u_out : u -> (t + int)

In this interpretation, SIG1 matches SIG2 because (u * u + int) -> u is
equal to (t + int) -> u under the assumption that t = u * u, and similarly
u -> (u * u + int) is equal to u -> (t + int).

However, in a transparent interpretation, a datatype specification exposes
the underlying recursive type:

     signature SIG1_transparent =
           type u = Rec(A) A * A + int
           type t = u * u

     signature SIG2_transparent =
           type t 
           type u = Rec(A) t + int

In this interpretation, SIG1 does not match SIG2 because u's abbreviation
in SIG1 is not equal to its abbreviation in SIG2.  Invoking t = u * u, the
latter may be shown equal to

     Rec(A) (Rec(A) A * A + int) * (Rec(A) A * A + int) + int

which, in the usual equational theory for types, is not the same as SIG1's

     Rec(A) A * A + int

What is happening here is, in order for SIG1 to match SIG2, the datatype
specification for u in SIG2 must be able to "capture" a definition given to
t, even when t is defined in terms of u.  This is possible in the opaque
setting because t and u are independent abstract types, and any interplay
between them is deferred to value fields.  In a transparent setting, the
necessary capture is impossible; u and A are different variables and the
recursive binding of A cannot capture any occurrences of u.


This example illustrates that under the usual equality rules for
iso-recursive types, Standard ML is incompatible with a transparent
interpretation.  However, in an implementation it is unacceptable to incur
the cost of a function call for every datatype construction and pattern
match, so the transparent interpretation is required.  In a type-preserving
compiler, one may adopt internally a new interpretation of the language,
but only when that internal interpretation is at least as permissive as the
external one, which we have shown is not the case here.  This poses no
problem to those compilers that erase types before compiling, but how can
Standard ML be implemented in a type-preserving manner?

Shao, in the FLINT compiler [2], addresses this problem with what we call
"Shao's equation" (where we write E [E'/X] to mean the capture-avoiding
substitution of E' for X in E):

     Rec(A) T  =  Rec(A) (T [Rec(A) T / A])

Shao's equation addresses the problem with the example above by rendering
the two abbreviations equal.  More generally, for any equational theory one
may prove that if the transparent interpretation accepts every program
typable under the opaque interpretation, then that theory must include all
instances of Shao's equation.  Thus, we argue that Shao's equation is
essential to efficient, type-preserving compilation of languages with
opaque datatypes, such as Standard ML.

Note that this equation falls short of the equation for equi-recursive
types (recursive types in which fold and unfold need not be performed

     EquiRec(A) T  =  T [EquiRec(A) T / A]

Since the right-hand side of Shao's equation is still a recursive type (in
contrast to the right-hand side of the equi-recursive type equation) it is
possible that the type equality problem with Shao's equation may be solved
more efficiently than the problem for equi-recursive types [3].  Indeed,
Shao claims to have an efficient algorithm for the problem [4].

Nevertheless, there is some question as to the validity of Shao's equation.
In many semantic contexts, though certainly not all, the equation may be
justifiable.  Note that terms having the left-hand side type,

     Rec(A) T  =  Tleft

and terms having the right-hand side type,

     Rec(A) (T [Rec(A) T / A])  =  Tright

both unfold to members of the same type:

     T [Rec(A) T / A]

Thus, terms may be coerced from one type to the other by unfolding them at
one type and refolding them at the other type.  For instance, if

     e : Tleft
     unfold [Tleft] e : (T [Rec(A) T / A])
     fold [Tright] (unfold [Tleft] e) : Tright

Thus, Shao's equation is justifiable in a semantic framework in which such
an fold-unfold operation (at different types) is the identity.
(Fold-unfold at the same type would be the identity in nearly any semantic
framework.)  A particularly important case where this is true is when fold
and unfold themselves are no-ops, as is the case in most implementations.


Opaque datatypes are purported to carry software engineering benefits, but
datatypes must be transparent, at least internally, to achieve efficient
compilation.  Were the transparent discipline more permissive than the
opaque one, this would not pose a problem, but we show that this is not so.

The opaque and transparent disciplines can be reconciled only by adopting
Shao's equation.  Therefore, we argue that Shao's equation is essential to
efficient, type-preserving compilation of any language with opaque
datatypes.  This equation is not valid in every semantic context, and
although it may be permissible in many important ones, at the very least it
complicates typechecking.  Thus, there are good reasons why one could
prefer to reject Shao's equation.  However, in a type-preserving compiler,
if we wish not to embrace Shao's equation, we are left with no choice but
to abandon opaque datatypes as well.



[1] Robert Harper and Chris Stone.  A type-theoretic interpretation of
    Standard ML.  In "Proof, Language and Interaction: Essays in Honour of
    Robin Milner."  The MIT Press, 1998.  To appear.

[2] Zhong Shao.  An overview of the FLINT/ML compiler.  In "1997 Workshop
    on Types in Compilation," Amsterdam, June 1997.  Published as Boston
    College technical report BCCS-97-03.

[3] Roberto Amadio and Luca Cardelli.  Subtyping recursive types.  ACM
    Transactions on Programming Languages and Systems.  15(4):575-631.

[4] Zhong Shao.  Personal Communication.


Suppose an equational theory is given and suppose that the transparent
interpretation accepts every program typable under the opaque
interpretation.  Let T[A] be an arbitrary type with free variable A and let
SIG1' and SIG2' be defined as follows:

     signature SIG1' =
           datatype u = C of T[u]
           type t = T[u]

     signature SIG2' =
           type t
           datatype u = C of t

In the opaque interpretation SIG1' matches SIG2', so SIG1' must match SIG2'
under the transparent interpretation as well.  In SIG1' the abbreviation
for u is

     Rec(A) T[A]

and in SIG2' it is

     Rec(A) t

which, invoking t = T[u], is equal to

     Rec(A) T[Rec(A) T]

Since these abbreviations must be equal, we conclude

     Rec(A) T[A]  =  Rec(A) T[Rec(A) T]