Template Haskell

CIS 194 Week 11 13 November 2014

Template Haskell is a compile-time meta-programming facility for Haskell. Let’s pull all that terminology apart:

Template Haskell is a language extension, specific to GHC. Recall that there is a standard for Haskell, called Haskell 2010. (There was another one called Haskell 98.) Any feature in GHC that deviates from this standard is enabled only by a language extension, which must be specified in a file that uses the feature. (Well, almost any feature.)

So, we enable Template Haskell now:

{-# LANGUAGE TemplateHaskell #-}
import Control.Monad ( replicateM )
import Data.Maybe    ( maybeToList )

TH (as we’ll call it) also has a library associated with it:

import Language.Haskell.TH

Somewhat annoyingly (but out of necessity), TH enforces its so-called “stage restriction”, which says that the only functions you can call at compile time must be written in other modules than the one being defined. This does make sense, because otherwise, you could write a recursive function which would be required to write part of itself. Thinking about such things quickly makes a Haskeller go (more) insane, so we forbid outright.

Thus, for these notes, we’ll have a companion module SpliceFunctions that has the compile-time code. However, so that we can see the definitions of the functions as you read this file, they are included here, too. We’ll import the SpliceFunctions module qualified so that we can unambiguously refer to the SpliceFunctions functions in splices.

import qualified SpliceFunctions as SF

Splices

Template Haskell has two modes: splices and quotes. Let’s start with splices. A splice is a chunk of TH code that produces some Haskell which becomes a part of your program. The type of an (expression) splice is Q Exp, where Q and Exp are types from the TH library.

Just to get an idea of where we’re going, let’s take a look at that Exp type:

*Main> :info Exp
data Exp
  = VarE Name
  | ConE Name
  | LitE Lit
  | AppE Exp Exp
  | InfixE (Maybe Exp) Exp (Maybe Exp)
  | UInfixE Exp Exp Exp
  | ParensE Exp
  | LamE [Pat] Exp
  | LamCaseE [Match]
  | TupE [Exp]
  | UnboxedTupE [Exp]
  | CondE Exp Exp Exp
  | MultiIfE [(Guard, Exp)]
  | LetE [Dec] Exp
  | CaseE Exp [Match]
  | DoE [Stmt]
  | CompE [Stmt]
  | ArithSeqE Range
  | ListE [Exp]
  | SigE Exp Type
  | RecConE Name [FieldExp]
  | RecUpdE Exp [FieldExp]

Most expression forms that you can write in Haskell can be encoded into an Exp, short for “expression”, naturally. (Not all expressions fit into Exp. TH often lags behind proper Haskell in some of Haskell’s more esoteric features.)

Now, what about Q?

*Main> :info Q
<snip>
newtype Q a = <snip>
instance Monad Q -- Defined in ‘Language.Haskell.TH.Syntax’
<snip>

Oh – so Q is just a monad. Q is the monad that wraps computations that can be run in the GHC compiler, at compile time. We’ll see more of the capabilities the Q monad grants as we proceed.

Now, let’s see a proper example:

add5 :: Integer -> Q Exp
add5 n = return (AppE (AppE (VarE (mkName "+")) (LitE (IntegerL n)))
                      (LitE (IntegerL 5)))
eleven :: Integer
eleven = $( SF.add5 6 )

The compileTimeAdd5 function produces an expression that applies a variable named “+” to the number provided and the number 5. (Recall that a + b really means (+) a b, which really means (((+) a) b).) TH uses the type Name instead of String to store variable names. This is because there can be multiple variables of the same name (perhaps in different scopes), and TH needs to be able to distinguish among them. Happily, we have a function mkName :: String -> Name that we’ll use often to build Names from Strings.

Then, in eleven, we see the use of a splice. A splice is written between $( and ). It must have type Q Exp. GHC takes the Exp that is produced and uses that as the code to put in place of the splice. You can see this in action by turning on the -ddump-splices option:

*Main> :set -ddump-splices
*Main> :reload
...
Splicing expression
    SF.add5 6 ======> (+) 6 5

What that output is saying is that the code SF.add5 6 is run, producing the expression (+) 6 5, which is then what is used as the definition of eleven.

We can actually do even better: let’s do the actual addition at compile time, which means we don’t have to bother when the application is running.

compileTimeAdd5 :: Integer -> Q Exp
compileTimeAdd5 n = return (LitE (IntegerL (n + 5)))
eleven' :: Integer
eleven' = $( SF.compileTimeAdd5 7 )
*Main> :reload
...
Splicing expression
    SF.compileTimeAdd5 7 ======> 12

Here, note that the result of SF.compileTimeAdd5 7 is just the number 12. No addition needs to happen when our program is running! Cool!

However, adding 5 is a terrible example of what TH is good for. TH shines when you must write a chunk of code, but that code is very repetitive. Say, for example, that you have variables named a1 through a100, and you want to put all of these into a list. It’s easy with TH:

listOfAs :: Q Exp
listOfAs = return (ListE (map VarE [ mkName ('a' : show n) | n <- [1..100] ]))

{- Why do you think this is commented out??
bigList :: [Int]
bigList = $( SF.listOfAs )
-}

As a more practical example, let’s consider the liftM family of functions, the first few I’ll write out.

liftM :: Monad m => (a -> b) -> m a -> m b
liftM f ma = do a <- ma
                return (f a)

liftM2 :: Monad m => (a -> b -> c) -> m a -> m b -> m c
liftM2 f ma mb = do a <- ma
                    b <- mb
                    return (f a b)

liftM3 :: Monad m => (a -> b -> c -> d) -> m a -> m b -> m c -> m d
liftM3 f ma mb mc = do a <- ma
                       b <- mb
                       c <- mc
                       return (f a b c)

That’s boring! We should produce those expressions programmatically.

Let’s look at the DoE constructor for Exp, which we’ll need to work with:

*Main> :info DoE
data Exp = ... | DoE [Stmt] | ...

Hm. Works with Stmts.

*Main> :info Stmt
data Stmt
  = BindS Pat Exp | LetS [Dec] | NoBindS Exp | ParS [[Stmt]]

The part of do notation with the <- is a binding statement, or BindS. On the left of the arrow is a pattern, naturally:

*Main> :info Pat
data Pat
  = LitP Lit
  | VarP Name
  | TupP [Pat]
  | UnboxedTupP [Pat]
  | ConP Name [Pat]
  | InfixP Pat Name Pat
  | UInfixP Pat Name Pat
  | ParensP Pat
  | TildeP Pat
  | BangP Pat
  | AsP Name Pat
  | WildP
  | RecP Name [FieldPat]
  | ListP [Pat]
  | SigP Pat Type
  | ViewP Exp Pat

A variable pattern (what we’ll need) is a VarP. We may have enough to go on now:

-- | Produce the body of a @liftM@ implementation. The parameter is the
-- number of the @liftM@.
liftMBody :: Int -> Q Exp
liftMBody n = let m_names = take n [ mkName ('m' : [x]) | x <- ['a'..] ]
                  names   = take n [ mkName [x]         | x <- ['a'..] ]
                  binds   = zipWith mk_bind m_names names
                  ret     = NoBindS (AppE (VarE (mkName "return"))
                                          (mk_apps (VarE (mkName "f"))
                                                   (map VarE names)))

                  mk_bind :: Name -> Name -> Stmt
                  mk_bind m_name name = BindS (VarP name) (VarE m_name)

                  -- apply one expression to a list
                  mk_apps :: Exp -> [Exp] -> Exp
                  mk_apps f []     = f
                  mk_apps f (x:xs) = mk_apps (AppE f x) xs

                  -- or
                  -- mk_apps = foldl AppE
              in
              return $ LamE (map VarP m_names) (DoE (binds ++ [ret]))

liftM4 :: Monad m => (a -> b -> c -> d -> e)
                  -> m a -> m b -> m c -> m d -> m e
liftM4 f = $( SF.liftMBody 4 )

If it type-checks, it must be right! :)

We can do even better, though. TH works not only for expressions, but also for types. If we use a splice in a context where a type is expected, it should have the type Q Type, and GHC will splice in a type instead of an expression.

*Main> :i Type
data Type
  = ForallT [TyVarBndr] Cxt Type
  | AppT Type Type
  | SigT Type Kind
  | VarT Name
  | ConT Name
  | PromotedT Name
  | TupleT Int
  | UnboxedTupleT Int
  | ArrowT
  | ListT
  | PromotedTupleT Int
  | PromotedNilT
  | PromotedConsT
  | StarT
  | ConstraintT
  | LitT TyLit

Using TH at the type level, we can reduce even more boilerplate.

Before we go on, we’ll start to use one more little bit of TH syntax: we can quote known names. For example, the mkName "return" above is a little silly. Instead, we can write 'return:

returnName :: Name
returnName = 'return

To quote a type in the same way, use two quote marks.

liftMType :: Int -> Q Type
liftMType n = let names = take n [ mkName [x] | x <- ['a'..] ]
                  types = map VarT names
                  m     = mkName "m"
                  res   = mkName "r"
                  resty = VarT res

                  -- make nested arrows
                  mk_arrs :: [Type] -> Type -> Type
                  mk_arrs []     result = result
                  mk_arrs (x:xs) result = AppT (AppT ArrowT x)
                                               (mk_arrs xs result)

                  -- apply "m" to a type
                  app_m :: Type -> Type
                  app_m = AppT (VarT m)
              in
              return $ ForallT (PlainTV m : PlainTV res :
                                map PlainTV names)
                               [ClassP ''Monad [VarT m]]
                               (mk_arrs (mk_arrs types resty : map app_m types)
                                        (app_m resty))

liftM5 :: $( SF.liftMType 5 )
liftM5 f = $( SF.liftMBody 5 )

But disaster strikes at liftM6!

{-
liftM6 :: $( SF.liftMType 6 )
liftM6 f = $( SF.liftMBody 6 )
-}

The problem is that we have not been hygienic. (That’s a technical term!) When creating names in liftMBody and liftMType, we’ve just used single-letter names, without worrying if these names are already in use. Well, f is in use, and so we fall over when we try to use it.

We need a way of creating a name guaranteed to be fresh. Happily, the Q monad has just such a facility – the newName function:

newName :: String -> Q Name

This function is guaranteed always to return a fresh name. The string passed in is just a suggestion for the name. You could always pass the same string in but get different names each time.

-- | Produce the body of a @liftM@ implementation. The parameter is the
-- number of the @liftM@.
liftMBody' :: Int -> Q Exp
liftMBody' n = do
  m_names <- replicateM n (newName "m")
  names   <- replicateM n (newName "a")
  let binds   = zipWith mk_bind m_names names
      ret     = NoBindS (AppE (VarE 'return)
                              (mk_apps (VarE (mkName "f"))
                                       (map VarE names)))

      mk_bind :: Name -> Name -> Stmt
      mk_bind m_name name = BindS (VarP name) (VarE m_name)

      mk_apps :: Exp -> [Exp] -> Exp
      mk_apps = foldl AppE
  return $ LamE (map VarP m_names) (DoE (binds ++ [ret]))

liftM6 :: $( SF.liftMType 6 )
liftM6 f = $( SF.liftMBody' 6 )

Note that we still use mkName to grab f. That’s because liftMBody is expecting that f is bound outside of the TH code. We really do want to be unhygienic here, so mkName is the right choice.

Quotations

TH is more than just a facility for optimizing compile-time calculations. TH also provides a syntax for quoting, where GHC will take an expression you write and convert it into data of the TH Exp type. You write a quotation between [| and |]. A quotation conveniently has the type Q Exp. You can even embed splices within quotations!

To continue our liftM example, we’ll actually use a type quotation, as expression quotations don’t really help us here. A type quotation is in brackets written [t| and |] and has type Q Type. Let’s see one in action:

liftMType' :: Int -> Q Type
liftMType' n = do
  names <- replicateM n (newName "a")
  m     <- newName "m"
  res   <- newName "r"

  let types = map varT names
      mty   = varT m
      resty = varT res

      -- make nested arrows
      mk_arrs :: [Q Type] -> Q Type -> Q Type
      mk_arrs []     result = result
      mk_arrs (x:xs) result = [t| $x -> $(mk_arrs xs result) |]

      -- apply "m" to a type
      app_m :: Q Type -> Q Type
      app_m ty = [t| $mty $ty |]

  forallT (map PlainTV (m : res : names)) (cxt [])
          [t| Monad $mty => $(mk_arrs types resty) ->
                            $(mk_arrs (map app_m types) (app_m resty)) |]

liftM7 :: $( SF.liftMType' 7 )
liftM7 f = $( SF.liftMBody' 7 )

Using quotations made that definition a little less hairy. One thing to notice here is that we’re using lower-case functions like varT instead of uppercase data constructors like VarT. varT has type Name -> Q Type as opposed to VarT with type Name -> Type. Having things wrapped up in the Q monad helps us out when we’re using quotations. To see why, just follow the types! Remember that anything following a $ must be in the Q monad.

Reification

The Q monad provides another very important facility: the ability to inspect predefined types and functions and see their definitions, not unlike the :info command in GHCi. The key function here is reify :: Name -> Q Info, where Info is like this:

*Main> :info Info
data Info
  = ClassI Dec [InstanceDec]
  | ClassOpI Name Type ParentName Fixity
  | TyConI Dec
  | FamilyI Dec [InstanceDec]
  | PrimTyConI Name Arity Unlifted
  | DataConI Name Type ParentName Fixity
  | VarI Name Type (Maybe Dec) Fixity
  | TyVarI Name Type

To inspect the output from reify, it’s helpful to use yet another capability of the Q monad: to use arbitrary I/O. TH exports the function runIO :: IO a -> Q a, which allows any IO operation in the Q monad. This is indeed very powerful, and we’ll discuss some ramifications later. For now, it just means that we can print in the middle of a splice.

$( do info <- reify ''Bool
      runIO $ print info
      return [] )

Compiling the splice above gives us

TyConI (DataD [] GHC.Types.Bool [] [NormalC GHC.Types.False [],NormalC GHC.Types.True []] [])

What’s going on here? First, we have a top-level declaration splice. We’ve seen expression splices and type splices, but this is our first declaration splice. Like the others, a declaration splice allows you to run arbitrary code (in this case, of type Q [Dec]) and splices the result into your program. Because Q is a monad, we can use do notation inside of a splice. Our first action is to call reify on the type Bool, storing the result in the variable info. We then embed a I/O action in the Q monad with runIO. (Recall that print :: Show a => a -> IO ().) Finally, we must result in a [Dec] – a list of Haskell declarations – but in this case we don’t have any declarations to splice, so we just return the empty list.

Looking at the output, we see a TyConI wrapping a DataD. DataD is one of the constructors of the Dec type.

*Main> :info Dec
data Dec
  = FunD Name [Clause]
  | ValD Pat Body [Dec]
  | DataD Cxt Name [TyVarBndr] [Con] [Name]
  | NewtypeD Cxt Name [TyVarBndr] Con [Name]
  | TySynD Name [TyVarBndr] Type
  | ClassD Cxt Name [TyVarBndr] [FunDep] [Dec]
  | InstanceD Cxt Type [Dec]
  | SigD Name Type
  | ForeignD Foreign
  | InfixD Fixity Name
  | PragmaD Pragma
  | FamilyD FamFlavour Name [TyVarBndr] (Maybe Kind)
  | DataInstD Cxt Name [Type] [Con] [Name]
  | NewtypeInstD Cxt Name [Type] Con [Name]
  | TySynInstD Name TySynEqn
  | ClosedTypeFamilyD Name [TyVarBndr] (Maybe Kind) [TySynEqn]
  | RoleAnnotD Name [Role]

A DataD encodes a datatype declaration. The info just tells us that the definition of Bool is like data Bool = False | True.

Larger example

Template Haskell can be very useful for defining boiler-plate instances. For example, consider the following class:

class Sizable a where
  size :: a -> Int
  size _ = 1

Any member of this class has a known size, with a default of 1. We can define instances for some basic types, all of size 1:

instance Sizable Int
instance Sizable Integer
instance Sizable Bool
instance Sizable Char

We could go further and start writing Sizable instances for other types, but the code would be very boring. Instead, let’s write TH functions to do the work for us.

mapMaybeM :: Monad m => (a -> m (Maybe b)) -> [a] -> m [b]
mapMaybeM _ [] = return []
mapMaybeM f (x:xs) = do
  maybe_b <- f x
  bs      <- mapMaybeM f xs
  return $ maybeToList maybe_b ++ bs
  
deriveSizable :: [Name] -> Q [Dec]  -- type is suitable for declaration splice
deriveSizable = mapMaybeM deriveSizable1

deriveSizable1 :: Name -> Q (Maybe Dec)
deriveSizable1 name = do
  info <- reify name
  case info of
    TyConI (DataD _ _name tvbs cons _)   ->
      Just `liftM` deriveSizableData name tvbs cons
    TyConI (NewtypeD _ _name tvbs con _) ->
      Just `liftM` deriveSizableData name tvbs [con]
    _                                    -> do
      reportError $ show name ++ " is not a datatype"
      return Nothing

deriveSizableData :: Name -> [TyVarBndr] -> [Con] -> Q Dec
deriveSizableData name tvbs cons = do
  clauses <- mapM deriveSizableClause cons
  return $ InstanceD context
                     (AppT (ConT ''Sizable)
                      (foldl AppT (ConT name) tvb_tys))
                     [FunD 'size clauses]
  where
    tvb_names = map getTvbName tvbs
    tvb_tys   = map VarT tvb_names
    context   = map (ClassP ''Sizable . (:[]) . VarT) tvb_names

    getTvbName (PlainTV n)    = n
    getTvbName (KindedTV n _) = n

deriveSizableClause :: Con -> Q Clause
deriveSizableClause (NormalC con_name stys) = do
  field_names <- mapM (const $ newName "x") stys
  clause [conP con_name (map varP field_names)]
         (normalB [| 1 + sum $(
           listE (map (\x -> [| size $(varE x) |]) field_names)
         ) |])
         []
deriveSizableClause (RecC con_name vstys)
  = let stys = map strip_fst vstys in
    deriveSizableClause (NormalC con_name stys)
  where
    strip_fst (_,b,c) = (b,c)
deriveSizableClause (InfixC sty1 con_name sty2)
  = deriveSizableClause (NormalC con_name [sty1, sty2])
deriveSizableClause (ForallC _ _ con) = deriveSizableClause con

Now, having done all of that, it is easy to generate new Sizable instances:

$( SF.deriveSizable [''Maybe, ''[], ''Either] )

trySize :: Int
trySize = SF.size (Just (Just "abc"))

This “deriving” use-case of TH is rather prevalent, so much so that GHC allows you to drop the $:

SF.deriveSizable [''Ordering]

Generated 2014-12-04 13:37:45.160867