RE: Type names vs type structure

A fully formal analysis of the kind you suggest is given in my paper with
Chris Stone entitled "A Type-Theoretic Interpretation of Standard ML" which
is to appear in the forthcoming Milner Festschrifft.  (See
http://www.cs.cmu.edu/~rwh/papers/ttisml/ttisml.ps for a preprint.)  This
paper outlines a static and dynamic semantics for Standard ML that is based
on an interpretation into an internal language with a type system that
includes recursive types and abstract types (in the form of translucent

I believe this work will confirm your intuitions about what is "really"
going on under the hood in Standard ML.  (In fact, we even implement the
language using this interpretation.)  A separate question is whether the
external language (Standard ML) could have been designed differently to
expose more of the underlying mechanism rather than bundling up so many
distinct concepts into one construct.  Here are a few pertinent points:

1. Caml experimented with the exact opposite approach to the one you
suggest: make records more like datatypes.  Record types are required to be
named by a declaration analogous to a datatype declaration.  Not only does
this restore symmetry, but it has the advantage of simplifying type
inference since now a given label uniquely determines a record type (in a
given context), much as constructors uniquely determine a datatype (in a
given context).  A disadvantage of this approach is that it discourages the
use of records for keyword parameter-passing, which is one significant use
of records in programming practice.

2. A difficulty with exposing labelled sum types is how to handle type
inference.  Given an injection [lab=exp], what is the type of this
expression?  Some sort of subtyping (via subsumption or row polymorphism) is
required to handle this case, otherwise the programmer is forced to write
something like [lab=exp]:tau, where tau is a sum type.  This gets pretty
tedious.  I'm not sure whether Caml has exposed sum types, but its handling
of records via row polymorphism seems to be very practical and natural.
Standard ML does a poor job of this with its half-hearted "..." notation.
(To be fair, this design long precedes the development of row polymorphism.)

3. A separable question is whether datatypes ought to be "generative", ie
introduce a new abstract type, or whether they ought to compare structurally
so that two separate, but identical, datatype declarations introduce
equivalent types.  This is more of a methodological question since, as you
point out, the module system already provides an abstraction mechanism, so
perhaps the datatype mechanism ought not itself introduce a separate form of
abstraction (generativity).  This is a thorny bush.  Roughly speaking, there
are two modes of use of the module system, one that stresses the use of
opaque signature ascription to control type abstraction, and one that
stresses transparent ascription and relies on datatype declarations to
introduce abstract types.  While the former has (to my mind) a clear
intuitive appeal, the latter is in fact much easier to use in practice
because it requires much less explicit finagling with type sharing and where
type specifications.  (It's analogous to type inference vs explicit typing.)
Having said that, I believe that one can have a perfectly sensible language
design in which datatypes are not intrinsically abstract.

Thanks for raising the issue.  We've been discussing this in the ML2000
group, and you've given us a good opportunity to open the issue to further
debate.  I'd be very interested to hear what people have to say on the

Bob Harper