# FUN types

Type *inference* for functional programming languages

```
{-# LANGUAGE FlexibleContexts, FlexibleInstances, OverlappingInstances #-}
module FunTypes where
```

```
import Data.Set (Set)
import qualified Data.Set as Set
import Data.Map (Map)
import qualified Data.Map as Map
import Control.Monad.State
import Control.Monad.Reader
import Control.Monad.Error
import Control.Monad.Writer
import Text.PrettyPrint.HughesPJ (Doc, (<+>),($$),(<>))
import qualified Text.PrettyPrint.HughesPJ as PP
import Parser hiding (get)
```

In this lecture, we will implement the Hindley-Milner type inference algorithm, a function of whose type is

` typeInference :: Expression -> Either String Type`

That is, the function takes a FUN expression, and returns either an error string, if the expression has a type error, or the type inferred for the expression.

This algorithm is the basis for Haskell type inference. It was originally developed by Robert Hindley (1969) in his study of combinatory logic and rediscovered by Robin Milner (1978) for use in ML. Milner's student Luis Damas analyzed the properties of this system in his Ph.D. dissertation (1985).

There are many presentations of the type system, and its type inference algorithm, called *Algorithm W*. We'll follow the structure that is most similar to the algorithm that GHC actually uses, though of course GHC optimizes this algorithm to make it efficient.

For more information about type inference in Haskell:

Simon Peyton Jones's third lecture from OPLSS, slides and video from the website.

Dimitrios Vytiniotis, Simon Peyton Jones, Tom Schrijvers, and Martin Sulzmann, OutsideIn(X): Modular type inference with local assumptions, in Journal of Functional Programming, Cambridge University Press, September 2011

## Overview

We'll work with the syntax and semantics of the simple, functional language FUN. Recall the modules FunSyntax and FunEnv from a few lectures ago.

```
import FunSyntax
import FunEnv hiding (ask)
```

As with typechecking WHILE, we will design type inference as an abstraction of the evaluation function. Recall that we had three sorts of values in this language:

```
data Value =
IntVal Int
| BoolVal Bool
| FunVal (Value -> Value)
```

However, compared to WHILE, there is a problem. What should be the type of a function like this?

`example1 = "fun X -> X"`

By analogue with Haskell, the type should be `a -> a`

. Therefore, types include type variables so that we will be able to compute that type for the expression.

```
data Type =
IntTy
| BoolTy
| FunTy Type Type -- i.e. t1 -> t2
| VarTy TypeVariable -- i.e. type 'a'
deriving (Eq, Show)
```

`type TypeVariable = Char`

As with the type checker for the WHILE language, we will write our type inferencer in a monad called `TcMonad`

.

` inferTy :: Expression -> TcMonad Type`

But what should this monad contain?

Like the environment base interpreter, we need to keep track of the types of free variables. We'll use a type environment for that:

`type TypeEnv = Map Variable Scheme`

and make sure that the monad has the ability to lookup the types of variables:

```
tLookup :: Variable -> TcMonad Scheme
tLookup x = do
m <- ask
case (Map.lookup x m) of
Just ty -> return ty
Nothing -> throwError "Unbound variable"
```

and add a new binding when a variable is introduced by a function or a let rec expression.

```
extendEnv :: Variable -> Scheme -> TcMonad a -> TcMonad a
extendEnv x t =
local (Map.insert x t)
```

However, when we type check a function expression, like `fun X -> X`

, we don't know what type `X`

should be. We'll use a type variable for its type in the typing environment, but that type variable should be different any other type variable that has been introduced. For example, consider

`example2 = "fun X -> fun Y -> X"`

This expression should have type `a -> b -> a`

, so we need to make sure that the type variable we introduce for Y is different from the one that we introduce for X. Therefore, our monad also needs the ability to generate *fresh* type variables.

```
fresh :: TcMonad TypeVariable
fresh = do
tv <- get
put $ succ tv
return tv
```

Now consider this function

`example3 = "fun X -> X + 1"`

In this example, the type of the argument to the function should be `Int`

instead of `a`

. But we won't know that until we see that `X`

is used as the argument to addition. To accommodate this sort of reasoning, we'll break type inference into two steps:

First, traverse the syntax of the expression, using type variables for unknown types, gathering constraints about types. This part only fails for unbound type variables.

Then, solve those constraints, potentially resolving some type variables to concrete types. It may be that the constraints are impossible to solve, in which case there is a type error in the expression.

For example, type inference for `example3`

above will determine that it has type `a -> Int`

under the constraint that `a = Int`

. That means that the actual type of the expression is `Int -> Int`

.

Therefore, we need a data structure for type equality constraints and a way to *tell* the type inference monad to record these constraints (if they are nontrivial).

`data Constraint = Equal Type Type`

```
equate :: Type -> Type -> TcMonad ()
equate t1 t2 | t1 == t2 = return ()
| otherwise = tell [Equal t1 t2]
```

In summary, we put this all together in the monad:

```
type TcMonad a =
WriterT [Constraint] -- gathered constraints
(ReaderT TypeEnv -- records types of expression variables
(StateT TypeVariable -- generate new type variables
(Either String))) -- error messages (unbound variables)
a
```

Running the first part of the type checker produces either an error (such as for unbound variables) or succeeds with a type and a list of constraints.

```
runTc :: TcMonad a -> Either String (a, [Constraint])
runTc m = evalStateT (runReaderT (runWriterT m) Map.empty) 'a'
```

# Constraint Generation

Now, we are ready to write the first part of type inference, which works with the above monad to infer the type of an expression, under a set of constraints.

`inferTy :: Expression -> TcMonad Type`

The function has the precondition that the type environment contains bindings for all free variables of the expressions.

First, the easiest cases are the literals whose types are trivially inferred.

```
inferTy (IntExp _) = return IntTy
inferTy (BoolExp _) = return BoolTy
```

For `if`

expressions, we first determine the types of the three subexpressions. Then we record two constraints: the condition must have boolean type, and both branches of the `if`

must have equal types (which is the type of the entire expression).

```
inferTy (If e1 e2 e3) = do
t1 <- inferTy e1
t2 <- inferTy e2
t3 <- inferTy e3
equate t2 t3
equate t1 BoolTy
return t2
```

Likewise, inference for boolean operators constrains the argument types to be integers and returns the appropriate result type depending on the particular operator.

```
inferTy (Op b e1 e2) = do
t1 <- inferTy e1
equate t1 IntTy
t2 <- inferTy e2
equate t2 IntTy
if b `elem` [Plus,Times, Minus]
then return IntTy
else return BoolTy
```

For variables, we simply look up in the type environment the type for the variable.

```
inferTy (Var x) = do
ts <- tLookup x
instantiate ts
```

For the `Fun`

case (ie function definition) we assign the formal `x`

a fresh type variable and analyze the body `e`

using that type variable. The type of a function is a function type from that type variable to the type of the body. (Though of course, there may be constraints on that type variable.)

```
inferTy (Fun x e) = do
tv <- fresh
let argTy = VarTy tv
bodyTy <- extendEnv x (Forall Set.empty argTy) (inferTy e)
return (FunTy argTy bodyTy)
```

In the `App e1 e2`

case, we know that the first subexpression must have a function type, where the type of the argument is the same as the type of `e2`

. However, we don't know what the result type should be, so we generate a fresh type variable.

```
inferTy (App e1 e2) = do
t1 <- inferTy e1
t2 <- inferTy e2
tv <- fresh
equate t1 (FunTy t2 (VarTy tv))
return (VarTy tv)
```

Finally, for the case of a (recursive) let-binding we infer the type for `e1`

and then add it to the environment bound to `x`

when analyzing `e2`

.

```
inferTy (LetRec x e1 e2) = do
tv <- fresh
ts1 <- generalize tv $ extendEnv x (Forall Set.empty (VarTy tv)) $ inferTy e1
extendEnv x ts1 $ inferTy e2
```

## Seeing the generated constraints

With a little code for pretty printing types and constraints, we can take a look at what constraints are generated for the various examples.

```
instance PP Type where
pp (VarTy i) = PP.text [i]
pp (FunTy t1@(FunTy _ _) t2) = (PP.parens (pp t1)) <+> PP.text "->" <+> pp t2
pp (FunTy t1 t2) = pp t1 <+> PP.text "->" <+> pp t2
pp IntTy = PP.text "Int"
pp BoolTy = PP.text "Bool"
```

```
instance PP Constraint where
pp (Equal t1 t2) = pp t1 <+> PP.text "=" <+> pp t2
```

```
genConstraints :: Expression -> Either String (Type, [Constraint])
genConstraints = runTc . inferTy
```

```
parseExp :: String -> Either String Expression
parseExp s = case parse s of
Just e -> Right e
Nothing -> Left "parse error"
```

```
putConstraints :: String -> IO ()
putConstraints s = case parseExp s >>= runTc . inferTy of
Left err -> putStrLn err
Right (t, c) ->
putStrLn (show (PP.text s <+> PP.text ":" <+> pp t $$ PP.text "where"
<+> (PP.vcat (map pp c))))
```

`example4 = "let rec F = fun X -> if X <= 1 then 1 else F (X - 1) in F"`

```
*Main> putConstraints example1
*Main> putConstraints example2
*Main> putConstraints example3
*Main> putConstraints example4
```

# Unification

As we saw in our informal overview, the algorithm proceeds by generating fresh type variables for unknown types, and then traverses the expression to generate constraints about the types of the sub-expressions based on how they are used with each other. These constraints are solved by a process called *unification*, which produces a mapping from type variables to types. We can use this mapping, called a *substitution*, to replace all occurrences of type variables with their definitions.

Next, we formalize the notion of substitution, and use it to define the unification procedure.

`data Substitution = Subst (Map TypeVariable Type) deriving Show`

The function `subst`

takes a substitution and applies it to a type by replacing variables in the type with the corresponding mapping in the substitution (if one exists.)

```
subst :: Substitution -> Type -> Type
subst (Subst s) (VarTy a) = case Map.lookup a s of
Just ty -> ty
Nothing -> (VarTy a)
subst s (FunTy t1 t2) = FunTy (subst s t1) (subst s t2)
subst s IntTy = IntTy
subst s BoolTy = BoolTy
```

It is useful to define an empty substitution (that leaves a type unchanged)

```
empSubst :: Substitution
empSubst = Subst Map.empty
```

and we can *compose* two substitutions as `s1 after s2`

```
after :: Substitution -> Substitution -> Substitution
Subst s1 `after` Subst s2 = Subst (Map.map (subst (Subst s1)) s2 `Map.union` s1)
```

which yields a single substitution that carries out the substitutions in `s2`

*after which* it carries out those in `s1`

. These two substitutions should have different domains, although the range of one may mention variables in the domain of the other.

(Note: `empSubst`

and `after`

make `Substitution`

a Monoid.)

## Most General Unifier

Armed with the above, we can now formally define the notion of type *unification*. We want unification to have the following example informal behavior:

T1 |
T2 |
Unified |
Substitution |
---|---|---|---|

`a` |
`Int` |
`Int` |
`a := Int` |

`a` |
`b` |
`b` |
`a := b` |

`a->b` |
`a->d` |
`a->d` |
`b := d` |

`a->Int` |
`Bool->b` |
`Bool->Int` |
`a := Bool, b:=Int` |

`Int` |
`Int` |
`Int` |
`empSubst` |

`Int` |
`Bool` |
Error |
Error |

`Int` |
`a->b` |
Error |
Error |

`a` |
`a->Int` |
Error |
Error |

The first few cases are where unification is indeed possible, and the last few cases are where it fails corresponding to a *type error* in the source program. The very last case is an interesting one; the failure is because the type variable `a`

in the first type *occurs* free inside the second type. Thus, if we try substituting `a`

with `a->Int`

we will just keep spinning forever! Hence, this also throws a unification failure.

**Exercise:** Write a Haskell program that is rejected by the typechecker because it fails the above *occurs check*. This is not difficult, the chances are you've done this any number of times already!

Here is the unification function `mgu`

that takes two types as input and either returns a successful unified output along with the substitution (as shown in the table above) or an error string explaining the failure (hence, our use of an error monad to describe output.)

```
mgu :: Type -> Type -> Either String Substitution
mgu a b | a == b = return empSubst
mgu ty (VarTy x) = varAsgn x ty
mgu (VarTy x) ty = varAsgn x ty
mgu (FunTy t1 t2) (FunTy t1' t2') = do
s1 <- mgu t1 t1'
s2 <- mgu (subst s1 t2) (subst s1 t2')
return $ s1 `after` s2
mgu t1 t2 = throwError $ "Can't equate " ++ show t1 ++ " with " ++ show t2
```

The function `varAsgn`

attempts to assign a type variable to a type and return that assignment as a subsitution, but throws an error if the variable occurs within the assigned type.

```
varAsgn :: TypeVariable -> Type -> Either String Substitution
varAsgn a t
| t == VarTy a = return empSubst
| a `Set.member` (fv t) = throwError
$ "occur check fails: " ++ show a ++ " in " ++ display t
| otherwise = return $ Subst (Map.singleton a t)
```

The function `fv`

calculates the set of variables that appear inside of a type.

```
fv :: Type -> Set TypeVariable
fv (VarTy v) = Set.singleton v
fv (FunTy t1 t2) = (fv t1) `Set.union` (fv t2)
fv IntTy = Set.empty
fv BoolTy = Set.empty
```

The name `mgu`

stands for *Most-General-Unifier* ; the function is guaranteed to find the most general unification possible (that is not unify `a`

and `b`

to `Int`

via the substitution `a := Int, b:= Int`

.) This property is crucial for showing that type inference returns the most general type possible for any term (that is, `a -> a`

and not `Int -> Int`

for the identity function).

We can solve the entire list of constraints by running the `mgu`

function on each one in sequence and combining the resulting substitutions with `after`

.

```
solve :: [Constraint] -> Either String Substitution
solve = foldM (\s1 (Equal ty1 ty2) -> do
s2 <- mgu (subst s1 ty1) (subst s1 ty2)
return (s2 `after` s1)) empSubst
```

Let's try it out on the examples above:

```
trySolve :: [Constraint] -> IO ()
trySolve cs = case solve cs of
Left err -> putStrLn err
Right (Subst s) -> mapM_ (\(v,ty) ->
putStrLn ([v] ++ " := " ++ display ty)) (Map.assocs s)
```

```
*Main> trySolve [Equal (VarTy 'a') IntTy]
*Main> trySolve [Equal (VarTy 'a') (VarTy 'b')]
*Main> trySolve [Equal (FunTy (VarTy 'a') (VarTy 'b'))
(FunTy (VarTy 'a') (VarTy 'd'))]
*Main> trySolve [Equal IntTy IntTy]
*Main> trySolve [Equal IntTy BoolTy]
*Main> trySolve [Equal IntTy (FunTy (VarTy 'a') (VarTy 'b'))]
*Main> trySolve [Equal (VarTy 'a') (FunTy (VarTy 'a') IntTy)]
```

# Putting it all together

Finally, we have all the pieces necessary to define the `typeInference`

function promised at the beginning.

```
typeInference :: Expression -> Either String Type
typeInference e = do
(ty, constraints) <- genConstraints e
s <- solve constraints
return (subst s ty)
```

```
top :: String -> IO ()
top s = case parseExp s >>= typeInference of
Left err -> putStrLn err
Right t -> putStrLn (s ++ " : " ++ show (pp t))
```

We can now try it out on our running examples:

```
*Main> top example1
*Main> top example2
*Main> top example3
*Main> top example4
```

As well as a few that don't type check:

```
bad1 = "X + 1"
bad2 = "1 + true"
bad3 = "(fun X -> X + 1) true"
bad4 = "fun X -> X X"
```

```
*Main> top bad1
*Main> top bad2
*Main> top bad3
*Main> top bad4
```

# Polymorphism

What about polymorphic types?

We've seen in Haskell that some function have polymorphic types, but we don't have that here (yet).

But we are close; the type inference algorithm that we've developed so far looks like it is almost there. The type of `fun X -> X`

is `a -> a`

, meaning that this function should be applicable at any type.

`example5 = "let rec Y = (fun X -> X) true in (fun X -> X) 3"`

But, unfortunately, if we give the identity function a name, the example no longer works. The variable `I`

has type `a -> a`

in the typing environment, but each use of `I`

adds constraints to `a`

.

`example6 = "let rec I = fun X -> X in let rec Y = I true in I 3"`

The key change is to store 'type schemes' instead of types in the typing environment. Type schemes represent polymorphic types---they indicate that the corresponding expression variable is polymorphic. For example, in Haskell, the identity function has type 'forall a. a -> a'.

`data Scheme = Forall (Set TypeVariable) Type`

Above, we need to update the definition of the type environment to

` type TypeEnv = Map TypeVariable Scheme`

When we have quantified types in the environment, we need to make sure that each use of the a polymorphic term can be to a different instance of the type. Therefore, we need to replace all quantified type variables in the type scheme by fresh type variables.

```
instantiate :: Scheme -> TcMonad Type
instantiate (Forall vs ty) = do
s <- foldM (\s v -> do
x <- fresh
return (Subst (Map.singleton v (VarTy x)) `after` s)) empSubst (Set.toList vs)
return (subst s ty)
```

## Generalization

Let-boudn expression variables are the only ones that may enter the typing environment with polymorphic types. The trick is that when type checking an expression `let rec x = e1 in e2`

, after we have inferred the type of `e1`

, if there are any unconstrained type variables in its type, they can be generalized. For example, in the term

` let rec I = fun X -> X in ...`

We should be able to infer that the type of `e1`

is `a -> a`

. Generalizing this type produces the scheme `forall a. a -> a`

. To do this generalization, we need to solve the constraints produced by checking `e1`

to produce its type. Any type variables in this type were unconstrained by `e1`

. As long as those type variables don't also appear in the typing environment, we know that they could *never* be constrained. So they can be generalized.

```
generalize :: TypeVariable -> TcMonad Type -> TcMonad Scheme
generalize tv m = do
(ty, constraints) <- listen m
equate ty (VarTy tv)
env <- ask
case (solve constraints) of
Left err -> throwError err
Right s -> let vars = fv (subst s ty) `minus` fvEnv (substEnv s env) in
return $ Forall vars (subst s ty)
```

```
minus :: Ord a => Set a -> Set a -> Set a
minus = Set.foldr Set.delete
```

```
substEnv :: Substitution -> TypeEnv -> TypeEnv
substEnv s env = Map.map (substScheme s) env where
```

```
substScheme :: Substitution -> Scheme -> Scheme
substScheme s (Forall vs ty) = (Forall vs (subst s ty))
```

We must also calculate the set of type variables that appear free in the typing environment.

```
fvEnv :: TypeEnv -> Set TypeVariable
fvEnv m = Map.foldr gather Set.empty m where
gather (Forall vs ty) s1 = s1 `Set.union` (fv ty `minus` vs)
```

With this change we can check the behavior of the type checker on a few more examples.

```
example7 = "let rec I = fun X -> X in I I"
example8 = "let rec I = fun X -> let rec Y = X in Y in I I"
```

## News :

Welcome to CIS 552!

See the home page for basic
information about the course, the schedule for the lecture notes
and assignments, the resources for links to the required software
and online references, and the syllabus for detailed information about
the course policies.