Prelude

When I’m read thorough first paper on Template Haskell (Template metaprogramming for Haskell), I can’t understand it. I see the tricky [|...|] and $(...) expressions, but can’t figure when I should use first construct, when to use second and how they must be combined together. Only after reading last sections with “technical details”, I discovered for myself how this actually works and could understand all previous “high-level” stuff. I try to read another papers on TH, but they was no more helpful. So I decided to write my own TH tutorial – such one, which gives to reader full understanding of what he does on each step of education. It is the result. Feel free to edit this wikipage, especially if you see my misusing of English (I’m not native speaker, after all J). I use curly braces to cite Haskell code in text.

Lesson 1: low-level TH

Template Haskell is macro preprocessor for Haskell, whose macros are written in Haskell itself, and is just ordinary functions. Macros are called (spliced) with syntax $macro or $(macro params...), for example:

zip’ = $mkZip

fst3 = $(cnst 2 “result”)

In this example, mkZip and (cnst 2 “result”) are SPLICE EXPRESSIONS. That means what they will be evaluated at COMPILE TIME and returned value will be converted to ordinary Haskell code and substituted instead of splice construction. Functions used in splice expressions must be defined in another module, imported by current one, because at the moment of this preprocessing functions defined in current module are not even compiled.

The whole splice expression must have type {Q Exp}, so in this example appropriate type for function cnst may be:

cnst :: Int -> String -> Q Exp

Type Q represents Quotation Monad. I will explain its role in next lesson, now we will just use {return} to lift the values of type {Exp} into type {Q Exp}. Type Exp is Template Haskell’s representation of Haskell expressions. It’s not a String, as you can suppose, because such simple representation would create problems for code generation and especially for code transformations. Instead, it’s a recursive structure, representing ABSTRACT SYNTAX TREE of expression. Such representations are usually resulted from syntactic parsing of Haskell programs in compliers, program transformation tools and so on. Here it is used for diametrically opposite goal – to create a right piece of Haskell code.

Value of type Exp can be converted to string containing actual Haskell code by function pprint (Want to know more? Template Haskell itself uses this function to insert generated code back to source files). Moreover, quotation brackets [| ... |] performs the opposite transition – they parse ordinary Haskell code and returns Exp structure representing it! So, before writing any programs we can play a little with it:

C:\Haskell> ghci –fth

ghci> :m +Language.Haskell.TH

Here, we use “-fth” to enable support of quotation brackets, and import module Language.Haskell.TH, which contains definition of Q, Exp, pprint and all other TH stuff. Let’s continue:

ghci> runQ [| 1 |] >>= print

LitE (IntegerL 1)

ghci> runQ [| \x _ -> x |] >>= print

LamE [VarP x_0,WildP] (VarE x_0)

ghci> runQ [| \x _ -> x |] >>= putStrLn.pprint

\x_0 _ -> x_0

We are use {print} to print the structures built by [|...|] and {putStrLn.pprint} to convert these structures back to strings with actual Haskell code. As you can see from last call, it really prints the same code – modulo renaming of variables. So, we discovered excellent TH debugging tools – you can use {runQ [| CODE |] >>= print} to see the structure that must be returned to generate some CODE, and {runQ (macro params...) >>= putStrLn.pprint} to print out the code that $(macro params...) will generate.

You can find definitions of types, used in TH to represent Haskell code, in the module Language.Haskell.TH.Syntax, and I partially cite these definitions here:

data Exp -- represents Haskell expressions

= VarE Name -- { x }

| LitE Lit -- { 5 or 'c'}

| AppE Exp Exp -- { f x }

| LamE [Pat] Exp -- { \ p1 p2 -> e }

| TupE [Exp] -- { (e1,e2) }

| CondE Exp Exp Exp -- { if e1 then e2 else e3 }

| ListE [Exp] -- { [1,2,3] }

| ...

data Pat -- represents Haskell patterns

= LitP Lit -- { 5 or 'c' }

| VarP Name -- { x }

| TupP [Pat] -- { (p1,p2) }

| WildP -- { _ }

| ...

data Lit -- represents Haskell literals

= CharL Char -- ‘a’

| StringL String -- “string”

| IntegerL Integer -- 123

| DoublePrimL Rational -- 12.3

| ...

data Name = ... -- represents Haskell symbols (identifiers and operators)

data Type = ... -- represents Haskell datatypes

As you can see, expressions are composed from other expressions, literals, variables, patterns and so on, in the strict compliance with the Haskell syntax rules.

Getting all this knowledge, you can easily write almost any macro, but let’s start from the simplest one: let’s define macro “cnst”, so that splice $(cnst n str) will generate function (lambda form) which accepts {n} (unused) parameters and returns just {str}. Several examples of code, which should be generated by this macro:

$(cnst 1 “x”) ==> (\_ -> “x”)

$(cnst 2 “str”) ==> (\_ _ -> “str”)

$(cnst 3 “hey”) ==> (\_ _ _ -> “hey”)

We can start from printing values, what should be generated by our function:

ghci> runQ [| \_ -> "x" |] >>= print

LamE [WildP] (LitE (StringL "x"))

ghci> runQ [| \_ _ -> "str" |] >>= print

LamE [WildP,WildP] (LitE (StringL "str"))

ghci> runQ [| \_ _ _ -> "hey" |] >>= print

LamE [WildP,WildP,WildP] (LitE (StringL "hey"))

At this moment, I hope, you can compare printed values with definition of type Exp and check that these values are really representations of code in quotation brackets [|...|]. Writing actual code should be trivial, please does it yourself and compare results with my module (remember to use {return} to lift result into Q monad!):

module Test where

import Language.Haskell.TH

cnst :: Int -> String -> Q Exp

cnst n s = return (LamE (replicate n WildP) (LitE (StringL s)))

Now we can test our macro by loading it into ghci (I assume that you put text of module Test in file test.hs):

C:\Haskell> ghci –fth test.hs

ghci> runQ (cnst 1 "x") >>= print

LamE [WildP] (LitE (StringL "x"))

ghci> runQ (cnst 1 "x") >>= putStrLn.pprint

\_ -> "x"

ghci> runQ (cnst 2 "str") >>= putStrLn.pprint

\_ _ -> "str"

It really works!!! Now let’s try to use this macro to actually define new functions:

ghci> let cnst1 = $(cnst 1 "x")

ghci> :t cnst1

cnst1 :: t -> [Char]

ghci> cnst1 21

"x"

ghci> let cnst2 = $(cnst 2 "str")

ghci> :t cnst2

cnst2 :: t -> t1 -> [Char]

ghci> cnst2 21 34

"str"

ghci> let cnst20 = $(cnst 20 "str")

ghci> :t cnst20

cnst20 :: t -> t1 -> ... -> t19 -> [Char]

And it works too! As homework, try to define variant of {cnst}, which can return any literal constant – be it a String, Char or Double. To do it, you should import module Language.Haskell.TH.Syntax, which contains function {lift}. This function converts Ints, Strings and so on to appropriate values of type Lit.

At the end of this lesson, I will give example of module, which uses our macro to define functions and also prints structure of generated expressions. You can use such modules instead of ghci to debug and test your code:

{-# OPTIONS_GHC -fglasgow-exts -fth #-}

module Main where

import Language.Haskell.TH

import Test

cnst1 = $(cnst 1 "x")

cnst2 = $(cnst 2 "str")

cnst20 = $(cnst 20 "foo")

main = do print (cnst1 11)

print (cnst2 11 12)

runQ(cnst 1 "x") >>= print

runQ(cnst 2 "str") >>= print

runQ(cnst 20 "foo") >>= print

runQ(cnst 1 "x") >>= putStrLn.pprint

runQ(cnst 2 "str") >>= putStrLn.pprint

runQ(cnst 20 "foo") >>= putStrLn.pprint

Lesson 2: generation of unique names and dynamic variables

If you actually tried to write your own TH macros after completing first lesson, you are probably noticed that I don’t explain how to create values of type Name. These values represent variables – both in patterns and expressions. We can’t use just String to represent them because that cannot give guarantees that variables created in different parts of programs, will not use the same name. Moreover, even several calls to one function creating variable with fixed name, can raise problems. Problems with overlapping variable names so seriously beat other macro preprocessors that TH proposed an ultimate solution in this area – Quotation Monad, named Q. This monad supports special operation for generating unique variable names:

newName :: String -> Q Name

Argument to newName will be used as prefix for generated name, followed by “_” and unique number. It should be used to give to generated variables more mnemonic names, but even with the same arguments each call to newName will generate new, unique variable name. As with any other monad, you can use {do} notation to execute monad operations:

somemacro = do var1 <- newName “x”

var2 <- newName “y”

return (...)

Now we are ready to define more interesting macro: $(sel n m) should generate a lambda form, which gets m-component tuple as argument and returns its n’th component, so:

$(sel 1 3) should generate code, equivalent to \( x,_, _) -> x

$(sel 2 4) should generate code, equivalent to \(_,x,_,_) -> x

Let’s start from looking at Exps we must return:

C:\Haskell> ghci –fth

ghci> runQ [| \(x,_,_) -> x |] >>= print

LamE [TupP [VarP x_0,WildP,WildP]] (VarE x_0)

ghci> runQ [| \(_,x,_,_) -> x |] >>= print

LamE [TupP [WildP,VarP x_1,WildP,WildP]] (VarE x_1)

Here we see our old friends, LamE and WildP, together with new ones: x_0 and x_1 represents values of type Name, which we should generate with help of newName, VarP/VarE incorporate these Names into patterns and expressions, respectively, and TupP creates pattern matching a tuple from list of patterns for individual tuple elements. Please draw attention to the following – if you need to refer to the same variable in different parts of generated code, you must generate its Name once by using newName and then use returned value. Different calls to newName, even with the same argument, will give you different variables! Don’t try to generate variables for your code in {let} or {where} clauses!

sel :: Int -> Int -> Q Exp

sel n m = do x <- newName “x”

let wilds = replicate m WildP

return (LamE (replaceAt (n-1) wilds (VarP x)) (VarE x))

-- |Replace n'th element (counted from 0) in `xs` to `x`

replaceAt n xs x = take n xs ++ x : drop (n+1) xs

There are rare cases when we don’t need to generate variables with unique names – on the contrary, we need to specify the exact identifier name which must be used in generated code. For such cases, there is a function mkName, which generates identifier with exact given name. This property can be used to refer to identifiers in outer, hand-written code or to link together independently generated parts of code. This function, like newName, returns value of type Name, which then can be used to construct expressions, patterns or declarations. But mkName, unlike newName, don’t run in Q monad, it’s a pure function:

mkName :: String -> Name

In order to directly compare these functions, try to run the following ghci session:

C:\Haskell> ghci –fth

ghci> :m +Language.Haskell.TH

ghci> runQ (return$ VarE (mkName “x”)) >>= putStrLn.pprint

when we need to control exact variable names that will be used in generated code, for example to refer to identifiers in the outer hand-written code or when we use our own schemes of variable names generation. For such cases, there is a possibility to create variable which will have in generated code the exact name you specified using function dyn:

C:\Haskell> ghci –fth

ghci> :m +Language.Haskell.TH

ghci> runQ (return$ VarE (mkName “x”)) >>= putStrLn.pprint

ghci> runQ (tupE [dyn "x", dyn “x”]) >>= putStrLn.pprint

(x,x)

This example also use new function tupE. What is difference between this function and constructor TupE? Let’s see its definition in module Language.Haskell.TH.Lib:

tupE :: [ExpQ] -> ExpQ

tupE es = do { es1 <- sequence es; return (TupE es1)}

As you can see

Lesson 3: other monadic activities

Besides newName, there are a number of other TH utilities whose results depend on the environment where TH function was executed. It includes error reporting and returning information about place where higher-level TH splice was called:

report :: Bool > String > Q ()

Report something to the user. If the Bool is True, the something is treated as an error, otherwise it is simply displayed. In both cases, though, execution continues. The difference between the two is seen by recover; if there is no enclosing recover, compilation fails.

giveUp :: Q a

Stop execution; find the enclosing recover.

recover :: Q a > Q a > Q a

The call (recover h q) runs q. If q executes giveUp, execution resumes with h. If q runs to completion, but has made some calls to report True, the result is discarded and h is run. If q runs to completion with no error report, h is ignored, and q's result is the result of the call to recover.

currentModule :: Q String

Returns the name of the module being compiled.

currentLoc :: Q (FilePath, Int)

Returns the location of the toplevel splice being executed.

The last two functions may be useful for constructing error messages.

Moreover, because top-level TH functions must return values in Q monad, there are a number of helper functions, which lifts constructors of Exp/Lit/Pat datatypes into the Q monad: litE, varE, appE, varP and so on. Their declarations also use lifted atatypes: ExpQ = Q Exp, LitQ = Q Lit, PatQ = Q Pat... (you can find all these lifted functions and types in module Language.Haskell.TH.Lib). Using these functions allow to decrease number of cases where “do” construct is needed.

Lesson 4: add sugar to your taste

Explicit creation of code with Exp/Pat constructors is much more complicated than writing Haskell code itself. Hopefully, there is a method to translate Haskell code into appropriate expression, which would return value of type Exp. And you already know this method – using quotation brackets [|...|] ! Let’s see how the sel function can be defined with them:

sel 1 2 = [| \(x,_) -> x |]

sel 2 2 = [| \(_,x) -> x |]

sel 1 3 = [| \(x,_,_) -> x |]

...

Quotation brackets [|...|] compiles as the expression of the type {Q Exp}, which upon execution in monad Q would return Exp, representing Haskell code inside brackets – and that it exactly what we need to define sel! We already used quotation brackets to see which expressions we should generate to represent some Haskell code. Now we are using them to directly represent the Exps we should return for each combination of sel arguments. Of course, that is not very interesting. The key to real usage of quotation brackets to define macro functions is their ability to include splice expressions! See, for example, the following recursive cnst definition:

cnst 0 str = lift str

cnst n str = do e <- cnst (n-1) str

[| \_ -> $e |]

Here, cnst called recursive and e receives result of this call – constant function with (n-1) arguments. This value used to build function with n arguments – just by using it in lambda form with one more argument. Result of this recursive calculation for (cnst 3 “foo”) would be {\_ -> (\_ -> (\_ -> “foo”))} what is fully equivalent to previous definition.

Spliced expression $x or $(f ...) inside quotation brackets also should be of type {Q Exp}. Its value is computed, converted to string with Haskell code by using function {pprint} and inserted at the place of call. After evaluating all splice calls inside quotation brackets, the whole Haskell code they contains converted back to the value of type {Q Exp}. There is also another interpretation of this process – that quotation brackets are converted to Haskell expression constructing appropriate Exp, where splice expressions are replaced to using of appropriate variables. We can simplify our definition:

cnst 0 str = [| str |]

cnst n str = [| \_ -> $(cnst (n-1) str) |]

I replaced {lift str} with [| str |] : variables not bound inside quotation brackets are bound to outside variables. But because compile-time variables is run-time constants(!) these outside-bound variables turns into literals of appropriate type – and that is equivalent to converting them via {lift} function. Also I replaced computation of {e} and splicing its value inside quotation brackets with computation of the same value in the splice call itself. Our new definition doesn’t contain any details related to Exp and Pat types, but nevertheless it works! And what is great – in many cases you can create macros without ever thinking about complex syntax trees they generate! Quotation brackets also hide details of using quotation monad Q – spliced expressions inside brackets can have type {Exp} or {Q Exp}, and variables created inside brackets get its own, unique names.

The splice call $(...) and quotation brackets [|...|] does the opposite things – former executes computation of type {Q Exp}, and converts its result to Haskell code using function {pprint :: Exp -> String}, while later gets Haskell code as ordinal String and converts this string into expression of type {Q Exp}, corresponding to this code. Therefore, $( [| XXX |] ) as well as [| $XXX |] can be replaced with just XXX. Using this rule, we can explore splicing of macros without going down to details of generated Exps. Let’s consider the following usage of cnst:

cnst3 = $(cnst 3 “x”)

We can replace call to cnst with right side of the appropriate definition:

cnst3 = $( [|\_ -> $(cnst (3-1) “x”) |] )

Combination of $(...) and [|...|] disappears, so this equivalent to:

cnst3 = \_ -> $(cnst 2 “x”)

Splice $(cnst 2 “x”) in turn can be replaced with result of this call, and so on. $(cnst 0 “x”) will be replaced with $([| “x” |]) which is equivalent to “x”, and the final result will be:

cnst3 = \_ -> ( \_ -> ( \_ -> “x” )))

That is really looks like macro expansion, you agree? But for those cases where expression can’t be built with just splicing and quotation brackets, the whole power of explicit Exp construction ready at your service. For example, definition of sel in previous lesson cannot be repeated in bracket syntax, because there is no way to create tuple with variable number of elements.

More complex example: zipn

useful to see what mkZip generates for a particular n in understanding how it works. When applied to 3, and the object variable(var "ff") it generates a value in the Expr type. Prettyprinting that value as concrete syntax we get:

\ y1 y2 y3 >

case (y1,y2,y3) of

(x1:xs1,x2:xs2,x3:xs3) > (x1,x2,x3) : ff xs1 xs2 xs3

(_,_,_) > []

mkZip :: Int > Expr > Expr

mkZip n name = lam pYs (caseE (tup eYs) [m1,m2])

where

(pXs, eXs) = genPE "x" n

(pYs, eYs) = genPE "y" n

(pXSs,eXSs) = genPE "xs" n

pcons x xs = [p| $x : $xs |]

b = [| $(tup eXs) : $(apps(name : eXSs)) |]

m1 = simpleM (ptup (zipWith pcons pXs pXSs)) b

m2 = simpleM (ptup (copies n pwild)) (con "[]")

More complex example: printf

Generation of declarations and identifiers reification

Until this moment, we only considered using of TH for generation of some Haskell expressions. But TH has more power – it can also create declarations: new functions, new data types, class instances and so on. Declarations in TH represented by type Dec. In order to generate declarations, you must place splice call in the module place where declarations allowed and make this splice to return value of type Q [Dec]. Such splice call may return any number of declarations which then will be substituted instead of splice call.

More complex example: deriving Show