You're already familiar with "value-level" programming, whereby you manipulate values such as42 :: Int or 'a' :: Char. In languages like Haskell, Scala, and many others, type-level programming allows you to manipulate types like Int :: * or Char :: * where * is the kind of a concrete type (Maybe a or [a] are concrete types, but not Maybe or [] which have kind * -> *).
Consider this function
foo :: Char -> Int
foo x = fromEnum x
Here foo takes a value of type Char and returns a new value of type Int using the Enum instance for Char. This function manipulates values.
Now compare foo to this type family, enabled with the TypeFamilies language extension.
type family Foo (x :: *)
type instance Foo Char = Int
Here Foo takes a type of kind * and returns a new type of kind * using the simple mapping Char -> Int. This is a type level function that manipulates types.
This is a very simple example and you might wonder how this could possibly be useful. Using more powerful language tools, we can begin to encode proofs of the correctness of our code at the type level (for more on this, see the Curry-Howard correspondence).
A practical example is a red-black tree that uses type level programming to statically guarantee that the invariants of the tree hold.
A red-black tree has the following simple properties:
- A node is either red or black.
- The root is black.
- All leaves are black. (All leaves are same colour as the root.)
- Every red node must have two black child nodes. Every
path from a given node to any of its descendant leaves contains the
same number of black nodes.
We'll use DataKinds and GADTs, a very powerful type level programming combination.
{-# LANGUAGE DataKinds, GADTS, KindSignatures #-}
import GHC.TypeLits
First, some types to represent the colours.
data Colour = Red | Black -- promoted to types via DataKinds
this defines a new kind Colour inhabited by two types: Red and Black. Note that there are no values (ignoring bottoms) inhabiting these types, but we aren't going to need them anyways.
The red-black tree nodes are represented by the following GADT
-- 'c' is the Colour of the node, either Red or Black
-- 'n' is the number of black child nodes, a type level Natural number
-- 'a' is the type of the values this node contains
data Node (c :: Colour) (n :: Nat) a where
-- all leaves are black
Leaf :: Node Black 1 a
-- black nodes can have children of either colour
B :: Node l n a -> a -> Node r n a -> Node Black (n + 1) a
-- red nodes can only have black children
R :: Node Black n a -> a -> Node Black n a -> Node Red n a
GADT lets us express the Colour of the R and B constructors directly in the types.
The root of the tree looks like this
data RedBlackTree a where
RBTree :: Node Black n a -> RedBlackTree a
Now it is impossible to create a well-typed RedBlackTree that violates any of the 4 properties mentioned above.
- The first constraint is obviously true, there are only 2 types inhabiting
Colour.
- From the definition of
RedBlackTree the root is black.
- From the definition of the
Leaf constructor, all leaves are black.
- From the definition of the
R constructor, both it's children must
be Black nodes. As well, the number of black child nodes of each subtree are equal (the same n is used in the type of both left and right subtrees)
All these conditions are checked at compile time by GHC, meaning that we will never get a runtime exception from some misbehaving code invalidating our assumptions about a red-black tree. Importantly, there is no runtime cost associated with these extra benefits, all the work is done at compile time.