# Red-Black Trees with GADTs

This version of RedBlack trees demonstrates the use of GADTs to statically verify the RedBlack tree invariants.

At this point, GADTs verify that leaves are black and that the root is black. The next step is to show that red nodes have black children.

`{-# LANGUAGE InstanceSigs,GADTs, DataKinds, KindSignatures #-}`

`module RedBlack where`

`import Persistent`

```
import Control.Monad
import Test.QuickCheck hiding (elements)
import Data.Maybe as Maybe
import Data.List (sort, nub)
```

A red-black tree is a binary search tree where every node is additionally marked with a color (red or black).

`data Color = Red | Black deriving (Eq, Show)`

A colored tree, where the index c indicates the color of the top node of the tree. Datatype promotion allows us to use Colors as parameters to type definitions. This is a GADT---a type that is indexed by another type. Each data constructor determines what the type parameter c must be.

```
data CT (c :: Color) (a :: *) where
E :: CT Black a
N :: (SColor c) -> (CT c1 a) -> a -> (CT c2 a) -> CT c a
```

```
color :: CT c a -> SColor c
color (N c _ _ _) = c
color E = B
```

A *top-level definition* that enforces that the root of the tree is black.

`data RBT a = Root (CT Black a) deriving Show`

```
data SColor (c :: Color) where
R :: SColor Red
B :: SColor Black
```

An equality test for singleton colors.

```
(%==%) :: SColor c1 -> SColor c2 -> Bool
R %==% R = True
B %==% B = True
_ %==% _ = False
```

Haskell cannot derive the show instances for `SColor`

and `CT`

now that they are indexed.

```
instance Show (SColor c) where
show R = "R"
show B = "B"
```

```
instance Show a => Show (CT c a) where
show E = "E"
show (N c l x r) = "(N " ++ show c ++ " " ++ show l ++ " " ++ show x ++ " " ++ show r ++ ")"
```

Furthermore, Red Black trees must satisfy the following invariants.

The empty nodes at the leaves are black

The root is always black

From each node, every path to a leaf has the same number of black nodes

Red nodes have black children

The first invariant is true by definition, the others we will

have to maintain as we implement the tree.Together, these invariants imply that every red-black tree is "approximately balanced", in the sense that the longest path to an empty node is no more than twice the length of the shortest.

From this, it follows that all operations will run in O(log_2 n) time.

## Checking the RBT invariants

We can write quickcheck properties for each of the invariants.

- The empty nodes at the leaves are black.

```
-- prop_Rb1 :: Bool
-- prop_Rb1 = color E == B
```

- The root of the tree is Black.

```
-- prop_Rb2 :: RBT Int -> Bool
-- prop_Rb2 (Root t) = color t == B
```

- For all nodes in the tree, all downward paths from the node to a leaf contain the same number of Black nodes.

```
prop_Rb3 :: RBT Int -> Bool
prop_Rb3 (Root t) = fst (aux t) where
aux :: CT c a -> (Bool, Int)
aux E = (True, 0)
aux (N c a x b) = (h1 == h2 && b1 && b2, if c %==% B then h1 + 1 else h1) where
(b1 , h1) = aux a
(b2 , h2) = aux b
```

- All children of red nodes are black.

```
prop_Rb4 :: RBT Int -> Bool
prop_Rb4 (Root t) = aux t where
aux :: CT c a -> Bool
aux E = True
aux (N R a x b) = color a %==% B && color b %==% B && aux a && aux b
aux (N B a x b) = aux a && aux b
```

And satisfies the binary search tree condition.

```
prop_BST :: RBT Int -> Bool
prop_BST t = isSortedNoDups (elements t)
```

```
isSortedNoDups :: Ord a => [a] -> Bool
isSortedNoDups x = nub (sort x) == x
```

To use quickcheck, we need an arbitrary instance. We'll use the one based on `insert`

and `empty`

.

```
instance (Ord a, Arbitrary a) => Arbitrary (RBT a) where
arbitrary = liftM (foldr insert empty) arbitrary
```

RBT proxy for the general set properties.

```
rbt :: Proxy RBT
rbt = Proxy
```

```
main :: IO ()
main = do
```

Make sure the RBT is a set

```
quickCheck $ prop_empty rbt
quickCheck $ prop_insert1 rbt
quickCheck $ prop_insert2 rbt
```

Implementation specific properties.

```
putStrLn "BST property"
quickCheck prop_BST
-- putStrLn "Leaves are black"
-- quickCheck prop_Rb1
-- putStrLn "Root is black"
-- quickCheck prop_Rb2
putStrLn "Black height the same"
quickCheck prop_Rb3
putStrLn "Red nodes have black children"
quickCheck prop_Rb4
```

## Implementation

We then just need to implement the method of the Set class for this data structure.

`instance Set RBT where`

The empty tree is the same as before.

```
empty :: RBT a
empty = (Root E)
```

Membership testing and listing the elements requires just a trivial change.

```
member :: Ord a => a -> RBT a -> Bool
member x (Root t) = aux x t where
aux :: Ord a => a -> CT c a -> Bool
aux x E = False
aux x (N _ a y b)
| x < y = aux x a
| x > y = aux x b
| otherwise = True
```

```
elements :: Ord a => RBT a -> [a]
elements (Root t) = aux t [] where
aux :: Ord a => CT c a -> [a] -> [a]
aux E acc = acc
aux (N _ a x b) acc = aux a (x : aux b acc)
```

Insertion, is, of course a bit trickier.

```
insert :: Ord a => a -> RBT a -> RBT a
insert x (Root t) = Root (blacken (ins x t))
```

After insertion, with the auxilary functio `ins`

, we blacken the top node of the tree to make sure that invariant (2) is always satisfied.

```
blacken :: IR a -> CT Black a
blacken (IN _ l v r) = N B l v r
```

```
data IR a where
IN :: SColor c -> CT c1 a -> a -> CT c2 a -> IR a
```

```
ins :: Ord a => a -> CT c a -> IR a
ins x E = IN R E x E
ins x s@(N c a y b)
| x < y = balanceL c (ins x a) y b
| x > y = balanceR c a y (ins x b)
| otherwise = (IN c a y b)
```

In the recursive calls, before returning the new tree, however, we may need to *rebalance* to maintain the red-black invariants. The code to do this is encapsulated in a helper function `balance`

.

## Balancing

- The key insight in writing the balancing function is that we do not try to rebalance as soon as we see a red node with a red child. That can be fixed just by blackening the root of the tree, so we return this tree as-is. (We call such trees, which violate invariants two and four only at the root "infrared").

The real problem comes when we've inserted a new red node between a black parent and a red child.

i.e., the job of the balance function is to rebalance trees with a black-red-red path starting at the root.

The result of rebalancing maintains the black height by converting to a red parent with black children.

Since the root has two children and four grandchildren, there are four ways in which such a path can happen.

```
balanceL :: SColor c -> IR a -> a -> CT c1 a -> IR a
balanceL B (IN R (N R a x b) y c) z d = IN R (N B a x b) y (N B c z d)
balanceL B (IN R a x (N R b y c)) z d = IN R (N B a x b) y (N B c z d)
balanceL col (IN c1 a x b) z d = IN col (N c1 a x b) z d
```

```
balanceR :: SColor c -> CT c1 a -> a -> IR a -> IR a
balanceR B a x (IN R (N R b y c) z d) = IN R (N B a x b) y (N B c z d)
balanceR B a x (IN R b y (N R c z d)) = IN R (N B a x b) y (N B c z d)
balanceR c a x (IN c1 b z d) = IN c a x (N c1 b z d)
```

## Notes

[0] Matthew Brecknell has a video + source code for using GADTs to guaranteed B-tree invariants.

[1] Anton Durgunov's Monad.Reader article about uses of GADTs in Haskell (including a section inspired by CIS 552!).

[2] Stefan Kahrs, "Red-black trees with types", Journal of functional programming, 11(04), pp 425-432, July 2001

[3] Andrew Appel, "Efficient Verified Red-Black Trees" September 2011. Presents a Coq implementation of a verified Red Black Tree based on Karhs implementation.

## News :

Welcome to CIS 552!

See the home page for basic
information about the course, the schedule for the lecture notes
and assignments, the resources for links to the required software
and online references, and the syllabus for detailed information about
the course policies.