Template Haskell is a Haskell extension that supports compile­time meta­programming. The purpose of the system is to support the algorithmic construction of programs at compile­time. The ability to generate code at compile time allows the programmer to use programming techniques not available in Haskell itself, such as macro­like expansion, user directed optimization (such as inlining), polytypic programs, generation of supporting data structures and functions from existing data structures and functions. For example, the code

yell file line = fail ($(printf "Error in file %s line %d") file line)

may be translated by TH to

yell file line = fail ((\x1 x2 -> "Error in file "++x1++" line "++show x2) file line)

As another example, the code

data T = A Int String | B Integer | C

$(deriveShow ''T)

may be translated to

data T = A Int String | B Integer | C

instance Show T

  show (A x1 x2) = "A "++show x1++" "++show x2

  show (B x1)    = "B "++show x1

  show C         = "C"

(if you are interested, you can find definitions of printf and deriveShow at the end of this documentation).

 

In TH, Haskell code generated just by ordinary Haskell functions. In order to use TH, you must learn 3 things:

·        How Haskell code represented in TH functions

·        How quotation monad used to supply unique names

·        How TH-generated code inserted in the module

There are also several more advanced topics:

·        Quotation monad

·        Quotation brackets

·        Reification

·        Error reporting and recovery

·        Debugging

I also included some examples of using TH:

·        printf

·        deriveShow

How Haskell code represented in TH functions

In Template Haskell, ordinary algebraic data types represent Haskell program fragments. These types modeled after Haskell language syntax and represents AST (abstract syntax tree) of corresponding Haskell code. There is an Exp type to represent Haskell expressions, Pat – for patterns, Lit – for literals, Dec – for declarations, Type – for data types and so on. You can see definitions of all these types in the module Language.Haskell.TH.Syntax. These types refer to each other according to rules of Haskell syntax, so using them you can construct values representing any possible Haskell program fragments. Just some simple examples:

·        varx = VarE (mkName "x") represents expression x, i.e. simple variable “x”

·        patx = VarP (mkName "x") represents pattern x, i.e. the same variable “x” used in pattern

·        str = LitE (StringL "str") represents constant expression "str"

·         tuple = TupE [varx, str] represents tuple expression (x,"str")

·        LamE [patx] tuple represents lambda form (\x -> (x,"str"))

To make our life easier, all constructors of Exp type have names ending with “E”, of Pat type – ending with “P” and so on. Function mkName, used here, creates value of type Name (representing identifier) from String with name of this identifier.

So, to generate some Haskell code, TH function must just create and return value of type Exp, which serve as representation for this chunk of code. You don’t even need to thoroughly learn Exp and other type’s definitions in order to know how to represent Haskell code you need – in the section Debugging I will say how you can print TH representation of any Haskell code chunk.

How quotation monad used to supply unique names

But TH functions are not pure functions returning values of type Exp. Instead, they are computations executed in special monad Q (called “quotation monad”), which allows to automatically generate unique names for variables using monadic operation newName::String->Q Name. This operation on each call generates unique name with given prefix. This name then may be used as part of pattern (by using constructor VarP::Name->Pat) and expressions (via VarE::Name->Exp).

Let’s write simple TH example – TH function tupleReplicate, which when used as “$(tupleReplicate n) x” will return n-element tuple containing x in all positions (just like replicate does for lists). Please draw attention that “n” is an argument of TH function, while “x” is an argument to anonymous function (lambda form) it generates! I provide the whole module containing this function definition (module Language.Haskell.TH is an “external interface” to TH – it provides all the data types and functions which are used to write TH programs):

module TupleReplicate where
 
import Language.Haskell.TH
 
tupleReplicate :: Int -> Q Exp
tupleReplicate n = do id <- newName "x"
                      return $ LamE (VarP id)
                                    (TupE $ replicate n $ VarE id)

For example, call “tupleReplicate 3 returns Exp equivalent to Haskell expression “(\x -> (x,x,x))”.

How TH-generated code inserted in the module

A splice is written $x, where x is an identifier, or $(...), where the "..." is an arbitrary expression. There must be no space between the "$" and the identifier or parenthesis. This use of "$" overrides its meaning as an infix operator, just as "M.x" overrides the meaning of "." as an infix operator. If you want the infix operator, put spaces around it.

A splice can occur in place of

·        an expression; the spliced expression must have type Q Exp

·        a list of top-level declarations; the spliced expression must have type Q [Dec]. Declarations, generated by splice, have access only to identifiers, whose declarations are textually precede them (which is controversial to usual Haskell practice of global access to all declarations placed in current module)

·        a type; the spliced expression must have type Q Type

Also you must know that

·        You must use compiler flag –fth to enable splices syntax

·        You can only run a function at compile time if it is imported from another module. That is, you can't define a function in a module, and call it from within a splice in the same module.

·        If you are building GHC from source, you need at least a stage-2 bootstrap compiler to run Template Haskell. A stage-1 compiler will reject the TH constructs.

Example of module which uses our tupleReplicate function:

{-# OPTIONS_GHC -fth #-}
module Test where
 
import TupleReplicate
 
main = do print ($(tupleReplicate 2) 1)     -- prints (1,1)
          print ($(tupleReplicate 5) "x")   -- prints ("x","x","x","x","x")

Quotation monad

Because top-level TH functions must return values in Q monad, there are a number of helper functions, which lifts constructors of Exp/Lit/Pat data types into the Q monad: lamE (lifted LamE), varE, appE, varP and so on. Their declarations also use lifted data types: ExpQ = Q Exp, LitQ = Q Lit, PatQ = Q Pat... (you can find all these lifted functions and types in module Language.Haskell.TH.Lib). Using these functions allow to decrease number of cases where do statement is needed.

There is also function lift, which converts any value, which has literal representation, to value of type Exp which represents this literal.

In some rare cases you don’t need unique variable name to be generated; instead, you need to specify the exact name of variable which must be generated in output code. For these cases, there is a (pure) function mkName::String->Name. There is also corresponding helper function “dyn s = return (VarE (mkName s))”, which returns Exp representing variable with exact the given name.

Quotation brackets

While the Exp can represent any Haskell expression, programmatic building of Exp values is not so easy work. In order to address this problem, Template Haskell supports quotation brackets, which is a way to convert literal Haskell code to data value representing it. There are four types of quotation brackets:

·        [| ... |], where the "..." is an expression; the quotation has type Q Exp

·        [p| ... |], where the "..." is a pattern; the quotation has type Q Pat

·        [d| ... |], where the "..." is a list of top-level declarations; the quotation has type Q [Dec]

·        [t| ... |], where the "..." is a type; the quotation has type Q Type

For example, [| \_ -> 0 |] will be translated to (return $ LamE [WildP] (LitE (IntegerL 0))). The quotation has type Q Exp (rather than just Exp), so that it need to be executed in Q monad to return appropriate Exp. This execution stage allows Template Haskell to replace all identifiers, introduced inside quotation brackets, by unique ones, generated internally with help of newName. For example, quotation [| \x -> x |] will be translated to the following code:

(do id <- newName "x"; return $ LamE [VarP id] (VarE id))

 

Moreover, inside quotation brackets we can again use splices, making TH some form of macro preprocessor, where a part of code written literally and part of code generated programmatically. For example, the quotation [| 1 + $(f x) |] will execute (f x) – which must have type Q Exp, translate returned Exp value to literal Haskell code, substitute it instead of splice call, and then reconvert the full expression inside brackets into the code which builds the appropriate Exp. Thanks to automatic renaming of internally used identifiers, the different quotations and even different invocations of the same quotation will never refer to the each others local variables. Consider the following definition:

summ n = summ' n [| 0 |]
summ' 0 code = code
summ' n code = [| \x -> $(summ' (n-1) [|$code+x|] ) |]

This definition generates lambda form with n parameters which sums up all its arguments, for example $(summ 3) -> (\x1 -> \x2 -> \x3 -> 0+x1+x2+x3). Please draw attention that generated code uses three different names for lambda parameters despite the fact that they all were generated by the same quotation. As you can see in this fragment, depth of quotation and splicing brackets can be arbitrary, the only rule is that they must interchange – no quotations inside quotations, and no splices inside splices.

 

The quasi­quote notation is a convenient shorthand for representing Haskell programs, and as such it is lexically scoped. More precisely: every occurrence of a variable is bound to the value that is lexically in scope at the occurrence site in the original source program, before any template expansion. This rule has 3 cases:

·        Quotation brackets prevent “capturing” of local variables, declared in one quotation, by another (like the usual Haskell prevents capturing of local variables, used in closures). I already described how that is accomplished by automatic renaming of all locally introduced identifiers. Only [p| ... |] quotation doesn’t rename variables this pattern introduces. Instead, TH provides function genpat, which generates unique pattern from the given one.

·        Global identifiers, referred inside quotation, “capture” the identifiers available in the environment where this quotation is defined (again, like usual Haskell), so you can pass without any problems value of quotation to functions in other modules, which don’t have these definitions or even have another definitions for the same names. This rule uses internal GHC mechanism of references to symbols in another modules, for example quotation [| map |] may be translated to reference to symbol “Data.List.map” or “++” operation, used in quotation, may be translated to reference to “GHC.Base.++”. If you need to use identifiers, available at place of splicing call, use the $(dyn "str") form.

·        Also inside quotation brackets you can use local variables of currently executed functions. These compile-time variables are run-time constants, so on translating brackets contents TH just substitute current values of these variables as literals. So, in this case [|... x ...|] is converted to [| ... $(lift x) ... |].

 

Splicing and quoting is opposite operations – one translates Exp to Haskell code, another – Haskell code to Exp, so their co-usage is disappear – $([| ... |]) is equivalent to (...), and so [| $(...) |]. This has inimitable value for development of TH programs – in many cases we can think entirely in terms of Haskell code generated, and don’t bother about Exp values it internally uses.

For example, consider the execution of splice $(summ 3). Just replace this call with its body:

$(summ 3) ->
$(summ' 3 [| 0 |]) ->
$([| \x -> $(summ' (3-1) [| $([|0|]) + x |] ) |]) ->

Now, we can kill occurrences of the $([| ... |]) and [| $(...) |], at the same time replacing “x” with unique identifier:

\x1 -> $(summ' (3-1) [|0+x1|]) ->

Again replace call to summ' with its body:

\x1 -> $([| \x -> $(summ' (2-1) [| $([|0+x1|]) + x |] ) |]) ->

And repeat the last two steps until the end:

\x1 -> \x2 -> $(summ' (2-1) [| 0+x1+x2 |]) ->
\x1 -> \x2 -> $([| \x -> $(summ' (1-1) [| $([|0+x1+x2|]) + x |] ) |]) ->
\x1 -> \x2 -> \x3 -> $(summ' (1-1) [| 0+x1+x2+x3 |]) ->
\x1 -> \x2 -> \x3 -> $([| 0+x1+x2+x3 |]) ->
\x1 -> \x2 -> \x3 -> 0+x1+x2+x3

It is interesting, that in this definition left side of lambda form (\x0 -> \x1...) is build recursively right on the stack of calls, while the right side (0+x1+...) is accumulated in the code variable. The same technique is used to implement Example: printf

Reification

Reification is a Template Haskell’s way of allowing the programmer to query the state of the compiler’s internal (symbol) table. The monadic operation “reify::Name->Q Info” returns information about given name: if it’s a global identifier (function, constant, constructor) – you can get it’s type, if it’s a type or class – you can get its structure. By using reify you are get “entry point” to symbol’s table, which then can be used to find information about other types, constructors, classes related to this identifier. You can find definition of type Info in the module Language.Haskell.TH.Syntax.

To get a Name, corresponding to identifier you are interested, you can, theoretically, use function mkName, but this solution is unsafe, because mkName returns unqualified name, which interpretation may be changed depending on context. On the other side, code “VarE id <- [| name |]” is safe, because id will be linked to qualified name (like “My.Own.Module.name”), but too verbose and need monadic context to run. So, Template Haskell supports another lightweight form of quotation: 'identifier returns Name, corresponding to identifier; let id = 'name” is fully equivalent to “VarE id <- [| name |]”. Please note that this syntax construction has type Name (not Q Exp, nor Q Name), so it can be used in contexts where monadic computations are impossible, for example:

f :: Exp ­> Exp

f (App (Var m) e)  |  m=='map  =  ...

This new form is still a quotation construct, just like [| v |], and follows the same rules as quotation brackets. For example, one cannot quote inside quotes, so this is illegal: [| 'v |]. The more important, that it is resolved statically, and returns fully qualified Name, whose meaning will be persistent.

Haskell's name­spaces make things just slightly more complicated. The quotation [| P |] would mean the data constructor P, whereas [t| P |] would mean the type constructor P. So we need the same distinction for lightweight quoting. We use two single-quotes to distinguish the type context:

'v means “The name v interpreted in an expression context”

''v means “The name v interpreted in an type context”

So ''a means the type variable a, for example.

You can find example of using lightweight quoting and reification to automatically generate Show instances in section Example: deriveShow.

 

The reify function can be used to get structure of type, but it cannot be used to get Exp representing the body of already defined function. If you need to reify function body – put declaration of this function in quotation brackets and explore returned result, like this:

$(optimize [d| fib = .... |])

or

fib = $(optimize [| .... |])

Error reporting and recovery

The Q monad makes it possible to report errors, and recover from failures gracefully. Here is the interface:

report :: Bool ­> String ­> Q ()

Report something to the user. If the Bool is True, the something is treated as an error, otherwise it is simply displayed. In both cases, though, execution continues. The difference between the two is seen by recover; if there is no enclosing recover, compilation fails.

giveUp :: Q a

Stop execution; find the enclosing recover.

recover :: Q a ­> Q a ­> Q a

The call (recover h q) runs q. If q executes giveUp, execution resumes with h. If q runs to completion, but has made some calls to report True, the result is discarded and h is run. If q runs to completion with no error report, h is ignored, and q's result is the result of the call to recover.

currentModule :: Q String

Returns the name of the module being compiled.

currentLoc :: Q (FilePath, Int)

Returns the location of the top-level splice being executed.

Debugging

In order to make debugging Template Haskell programs easier, compiler supports flag -ddump-splices, which shows the expansion of all top-level splices as they happen.

Also, you can run computations in Q monad programmatically with help of “runQ::Q a->IO a” and print their results either in form of AST in order to know how you must build such expression:

 

C:\Haskell> ghci –fth

ghci> :m +Language.Haskell.TH

 

ghci> runQ [| \x _ -> x |] >>= print

LamE [VarP x_0,WildP] (VarE x_0)

... or in the form of Haskell code to see what the code will be generated by some splice call:

C:\Haskell> ghci

ghci> :m +Language.Haskell.TH

ghci> :m +Cnst

 

ghci> runQ(cnst 2 "str") >>= putStrLn.pprint

\_ _ -> "str"

This technique can be also used in modules which imports appropriate definitions of functions, written in TH, but then print results of calls via print and pprint instead of splicing them:

{-# OPTIONS_GHC -fglasgow-exts -fth #-}

module Main where

 

import Language.Haskell.TH

import Cnst

 

-- module Cnst defines function `cnst`, which can be used in splices:

 

cnst1 = $(cnst 1 "x")

cnst2 = $(cnst 2 "str")

cnst20 = $(cnst 20 "foo")

 

-- ... but we can also run `cnst` via runQ to see how it works:

main = do runQ(cnst 1 "x") >>= print

          runQ(cnst 2 "str") >>= print

          runQ(cnst 20 "foo") >>= print

          runQ(cnst 1 "x") >>= putStrLn.pprint

          runQ(cnst 2 "str") >>= putStrLn.pprint

          runQ(cnst 20 "foo") >>= putStrLn.pprint

 

This is the module Cnst, used in these examples:

module Cnst where

import Language.Haskell.TH

 

cnst :: Int -> String -> Q Exp

cnst n s = return (LamE (replicate n WildP) (LitE (StringL s)))

Example: printf

That is the definition of function printf, mentioned earlier, together with Main module what uses it. Compile with "ghc -fth --make Main.hs

{- Main.hs -}
module Main where
 
-- Import our template "printf"
import Printf (printf)
 
-- The splice operator $ takes the Haskell source code
-- generated at compile time by "printf" and splices it into
-- the argument of "putStrLn".
main = putStrLn ( $(printf "Error in file %s line %d: %s") "io.cpp" 325 "printer not found" )

 

 

{- Printf.hs -}
module Printf where
 
-- Import Template Haskell interfaces
import Language.Haskell.TH
 
-- Describe a format string
data Format = D            -- represents "%d"
            | S            -- represents "%s"
            | L String     -- represents other parts of format string, printed literally
 
-- Parse a format string.
parse :: String -> String -> [Format]
parse ('%':'s':xs) rest  =  L rest : S : parse xs ""
parse ('%':'d':xs) rest  =  L rest : D : parse xs ""
parse ""           rest  = [L rest]
parse (x:xs)       rest  =  parse xs (rest++[x])
 
-- Generate Haskell source code from a parsed representation
-- of the format string.  This code will be spliced into
-- the module which calls "printf", at compile time.
gen :: [Format] -> ExpQ -> ExpQ
gen [] code = code
gen (D : xs) code = [| \x-> $(gen xs [| $code++show x |]) |]
gen (S : xs) code = [| \x-> $(gen xs [| $code++x |]) |]
gen (L s : xs) code = gen xs [| $code++s |]
 
-- Here we generate the Haskell code for the splice
-- from an input format string.
printf :: String -> ExpQ
printf s = gen (parse s "") [| "" |]

Example: deriveShow

This is the minimal example which shows how TH can be used to automatically generate class instances. It uses ''type notation and reify function to generate Show instance for given data type. To simplify code, I don’t handle here parametric types, types with named fields and other “complex” types.

 

{- Main.hs -}

module Main where

import Derive

 

data T = A Int String | B Integer | C

$(deriveShow ''T)

 

main = print [A 1 "s", B 2, C]  -- prints exactly <<[A 1 "s",B 2,C]>>

 

 

{- Derive.hs -}

module Derive where

 

import Language.Haskell.TH

import Control.Monad

 

data T1 = T1

data T2 a = T2 a

 

deriveShow t = do

  -- Get list of constructors for type t

  TyConI (DataD _ _ _ constructors _)  <-  reify t

 

  -- Make `show` clause for one constructor:

  --   show (A x1 x2) = "A "++show x1++" "++show x2

  let showClause (NormalC name fields) = do

        -- Name of constructor, i.e. "A". Will become string literal in generated code

        let constructorName = nameBase name

        -- Get variables for left and right side of function definition

        (pats,vars) <- genPE (length fields)

        -- Recursively build (" "++show x1++...++"") expression from [x1...] variables list

        let f []       = [| "" |]

            f (v:vars) = [| " " ++ show $v ++ $(f vars) |]

        -- Generate function clause for one constructor

        clause [conP name pats]                                 -- (A x1 x2)

               (normalB [| constructorName ++ $(f vars) |]) []  -- "A "++show x1++" "++show x2

 

  -- Make body for function `show`:

  --   show (A x1 x2) = "A "++show x1++" "++show x2

  --   show (B x1)    = "B "++show x1

  --   show C         = "C"

  showbody <- mapM showClause constructors

 

  -- Generate template instance declaration and then replace

  --   type name (T1) and function body (\x -> "text") with our data

  d <- [d| instance Show T1 where

             show x = "text"

       |]

  let    [InstanceD [] (AppT showt (ConT _T1)) [FunD showf _text]] = d

  return [InstanceD [] (AppT showt (ConT t  )) [FunD showf showbody]]

 

 

-- Generate n unique variables and return them in form of patterns and expressions

genPE n = do

  ids <- replicateM n (newName "x")

  return (map varP ids, map varE ids)