[Prev][Next][Index][Thread]

RE: type safety



This discussion on type safety has been interesting, but seems to founder 
occasionally on what exactly the definitions are for the terms being 
bandied about.  I'm in the midst of writing a book on types and semantics 
for object-oriented languages.  While writing it, I found remarkably little 
published on the definitions of such terms as "type-safe" and 
"strongly-typed".  In the hopes of both contributing to the discussion and 
(hopefully) getting some feedback on what I wrote, I've attached 
two excerpts from my manuscript.  I am presenting them in the reverse order 
that they actually appear in the manuscript because the later one is both 
shorter and more relevant.  I would be interested in comments on both.  
The reference to Cardelli is his chapter on Type Systems in the Handbook of
Computer Science and Engineering (Matthias also referred to this article in 
one of his posts).  \SOOL refers to a simple OO language described in the 
text.

	Kim

>From Chapter 12 of my book manuscript:

Cardelli \cite{CardelliHBCS} distinguishes between what he calls 
\emph{trapped} and \emph{untrapped errors}.  Trapped errors cause a 
computation to stop immediately, while untrapped errors may allow the 
computation to continue (at least for a while).  An example of an 
untrapped error might be accessing data beyond the end of an array in 
a language like C. Cardelli argues that a \emph{safe} language is one 
that does not allow any untrapped errors to occur at run-time.

In a given language, a subset of the errors may be designated as 
forbidden errors.  The forbidden errors should include all untrapped 
errors as well as a subset of trapped errors.  A language where none 
of the legal programs can generate forbidden errors is said to be 
\emph{strongly checked}.  

For \SOOL, as with most typed programming languages, the forbidden 
errors will include all run-time type errors.  It is difficult to 
characterize exactly what type errors are without listing them all 
explicitly.  However, a reasonable definition of a type error is an 
attempt to perform an operation that makes no sense based on the types 
(not values) of the arguments.  

For example, if \texttt{f} is a function taking integers to integers, 
then it makes no sense to apply \texttt{f} to an object of type 
\texttt{PointType}.  Similarly it makes no sense to attempt to apply 
an object (as though it were a function) to an integer.  The type 
error we are most concerned with in object-oriented programming 
languages is to send a message to an object which does not have a 
method corresponding to the message.

Examples of non-type errors include dividing an integer by zero (since 
both operands are integers, the division makes sense based on the 
types).  Similarly sending a message to a \texttt{nil} object is not a type 
error as long as the static type of the \texttt{nil} object includes 
the message name.  Accessing data beyond the end of an array is also 
not a type error (though it hopefully is a forbidden error in the 
language).  Finally, a non-terminating program is generally not the 
result of a type error.

Statically type-safe languages ensure that programs which pass the 
type-checker will not generate type errors at run-time (though other 
forbidden errors may be detected and handled at run-time).  Programs 
that do not pass the type-checker are not considered legal programs.  
Semantics are generally only provided for legal programs of the 
language.

--------------------------------------------------------------
>From the introductory chapter:

Every value generated in a program is associated with a type.  In a 
strongly typed language, the language implementation is required to 
provide a correct type checker that ensures that no type errors will 
occur at run time.  For example, it must 
check the types of operands in order to ensure that nonsensical 
operations, like dividing the integer 5 by the string ``hello'', are 
not performed.  Strongly typed languages may either be dynamically or 
statically type checked.  Dynamic type-checking normally occurs 
during program execution, while static type-checking normally occurs 
at compile time.  Other type-related checks may also take place at program 
link-time.

In a dynamically typed language like LISP or Scheme, many 
operations are type-checked just before they are performed.  Thus the 
code for a plus operation may have to check the type of its operands 
just before the addition is performed.  If both operands 
are integers, then an integer addition is performed.  If both operands 
are floating point numbers or one is floating point and the other is an 
integer, then a floating point addition is performed.  However, if one 
operand is a string and the other is a floating point number then 
execution is terminated with an error message.  In some languages, an 
exception may be raised, which may either be handled by the program before 
resuming normal execution or, if there is no handler or no handler 
can successfully execute, the program terminates.

In a statically typed language, every expression of the language is
assigned a type at compile time.  If the type system can ensure that the 
value of each expression has a type compatible with the statically 
assigned type of the expression,
then type checking of most operations can be moved to compile time.  

Dynamically typed programming languages can be more expressive and 
flexible than statically typed languages, because the type checking 
is postponed until run-time.  It is undecidable in general to determine 
statically for an arbitrary program whether 
a type error will occur at run time.\footnote{We leave it as an 
exercise for the more sophisticated reader to show this problem can 
be reduced to the halting problem.  Hint:  Have a type error result 
only if a program input as data halts.}  As a result, static type 
checkers will rule out some programs as potentially unsafe
that would actually execute without a type error.  

While the exclusion of safe programs would seem to be a major problem
with static type checking, there are many advantages to having a
statically type-checked language.  These include:
\begin{itemize}
    \item providing earlier, and usually more accurate, information on
	programmer errors,

    \item providing documentation on the interfaces of components (\eg,
    procedures, functions, and packages or modules),

    \item eliminating the need for run-time type checks that can slow
	program execution and increase program size, and
    
    \item providing extra information that can be used in compiler
	optimizations.  
\end{itemize}
As a result most modern languages have static type systems.

Procedural languages like Pascal \cite{Pascal}, Clu \cite{Liskclu},
Modula-2 \cite{Modula2}, and Ada 83 \cite{DoD}, and functional languages like 
ML \cite{HMM86} and Haskell \cite{Haskell}
have reasonably safe static typing systems.  While some 
of these languages have a few minor
holes in the type system (\eg, variant records in Pascal), languages
like ML, Haskell, CLU, and Ada provide fairly secure type systems.

	Kim Bruce