# Red-Black Trees

Using The Haskell type system to guarantee data structure invariants.

This file is a version of the RedBlack tree implementation that we presented in class, modified to enforce two of the red black tree invariants at compile time.

`{-# LANGUAGE GADTs, DataKinds #-}`

`module RedBlack1 where`

`import Persistent`

```
import Control.Monad
import Test.QuickCheck
import Data.Maybe as Maybe
```

- A red-black tree is a binary search tree where every node is additionally marked with a color (red or black) and in which the following invariants are maintained:

The empty nodes at the leaves are black

*enforced by the type system*The root is always black

*enforced by the type system*From each node, every path to a leaf has the same number of black nodes

Red nodes have black children

## Invariants

Together, these invariants imply that every red-black tree is "approximately balanced", in the sense that the longest path to an empty node is no more than twice the length of the shortest.

From this, it follows that all operations will run in O(log_2 n) time.

## Implementation

The data declaration is a straightforward modification of the one for unbalanced trees:

`data Color = Red | Black deriving (Eq, Show)`

This is a GADT---a type that is indexed by another type. Each data constructor determines what the type parameter c must be.

```
data SColor (c :: Color) where
R :: SColor Red
B :: SColor Black
```

An equality test for singleton colors.

```
(%==%) :: SColor c1 -> SColor c2 -> Bool
R %==% R = True
B %==% B = True
_ %==% _ = False
```

A colored tree, where the index c indicates the color of the top node of the tree.

```
data CT c a where
E :: CT Black a -- empty leaves must be black
N :: (SColor c) -> (CT c1 a) -> a -> (CT c2 a) -> CT c a
```

A *top-level definition* that enforces that the root of the tree is black.

`data RBT a = Root (CT Black a) deriving Show`

Haskell cannot derive the show instances for `SColor`

and `CT`

now that they are indexed.

```
instance Show (SColor c) where
show R = "R"
show B = "B"
```

```
instance Show a => Show (CT c a) where
show E = "E"
show (N c l x r) = "(N " ++ show c ++ " " ++ show l ++ " " ++ show x ++ " " ++ show r ++ ")"
```

For any red/black tree, we can determine its color. Note that the color must match the type index.

```
color :: CT c a -> SColor c
color E = B
color (N c _ _ _) = c
```

We then just need to implement the method of the Set class for this data structure.

`instance Set RBT where`

The empty tree is the same as before.

```
-- empty :: RBT a
empty = Root E
```

Membership testing and listing the elements requires just a trivial change.

```
-- member :: Ord a => a -> RBT a -> Bool
member x (Root t) = mem x t where
mem :: Ord a => a -> CT c a -> Bool
mem x E = False
mem x (N _ a y b)
| x < y = mem x a
| x > y = mem x b
| otherwise = True
```

```
-- elements :: Ord a => RBT a -> [a]
elements (Root t) = aux t [] where
aux :: Ord a => CT c a -> [a] -> [a]
aux E xs = xs
aux (N _ a y b) xs = aux a (y : aux b xs)
```

Insertion, is, of course a bit trickier. It uses an auxiliary function, `aux`

, that returns a tree that is a correct red black tree, *except* that the root could be red. Therefore, insert includes an additional step to blacken the top node of the tree.

```
insert x (Root t) = blacken (ins x t) where
blacken (NN _ l y r) = (Root (N B l y r))
-- blacken E = error "can't happen"
```

Insertion is implemented in terms of a recursive function ins, which returns a result of type

```
data Node a where
NN :: SColor c -> CT c1 a -> a -> CT c2 a -> Node a
```

and walks down the tree until...

`ins :: Ord a => a -> CT c a -> Node a`

... it gets to an empty leaf node, in which case it constructs a new (red) node containing the value being inserted...

`ins x E = NN R E x E`

... or discovers that the value being inserted is already in the tree, in which case it returns the input unchanged:

```
ins x s@(N c a y b)
| x < y = balanceL c (ins x a) y b
| x > y = balanceR c a y (ins x b)
| otherwise = NN c a y b
```

In the recursive calls, before returning the new tree, however, we may need to *rebalance* to maintain the red-black invariants. The code to do this is encapsulated in a helper function `balance`

.

## Balancing

The key insight in writing the balancing function is that we do not try to rebalance as soon as we see a red node with a red child. Instead, we return this tree as-is and only rebalance after we've inserted the new red node between the black parent and the red child.

i.e., the job of the balance function is to rebalance trees with a black-red-red path starting at the root.

The result of rebalancing maintains the black height by converting to a red parent with multiple black children.

Since the root has two children and four grandchildren, there are four ways in which such a path can happen.

[ pictures on the chalkboard ]

```
balanceL :: Ord a => SColor c -> Node a -> a -> CT c1 a -> Node a
balanceL B (NN R (N R a x b) y c) z d =
NN R (N B a x b) y (N B c z d)
balanceL B (NN R a x (N R b y c)) z d =
NN R (N B a x b) y (N B c z d)
balanceL col (NN col' a x b) y c = NN col (N col' a x b) y c
```

```
balanceR :: Ord a => SColor c -> CT c1 a -> a -> Node a -> Node a
balanceR B a x (NN R (N R b y c) z d) =
NN R (N B a x b) y (N B c z d)
balanceR B a x (NN R b y (N R c z d)) =
NN R (N B a x b) y (N B c z d)
balanceR col a x (NN col' b y c) = NN col a x (N col' b y c)
```

## Making Sure Invariants are Preserved

We'll use quickcheck to test our tree implementation.

```
instance (Ord a, Arbitrary a) => Arbitrary (RBT a) where
arbitrary = liftM (foldr insert empty) arbitrary
```

Now consider the red black tree invariants:

- The empty nodes at the leaves are black.

```
-- prop_Rb1 :: Bool
-- prop_Rb1 = color E == B
```

- The root of the tree is Black.

```
-- prop_Rb2 :: RBT Int -> Bool
-- prop_Rb2 t = color t == B
```

- For all nodes in the tree, all downward paths from the node to a leaf contain the same number of Black nodes.

```
prop_Rb3 :: RBT Int -> Bool
prop_Rb3 (Root t) = Maybe.isJust (aux t) where
aux :: CT c Int -> Maybe Int
aux E = Just 1
aux (N c l _ r) = do
bh1 <- aux l
bh2 <- aux r
guard (bh1 == bh2)
return $ case c of { R -> bh1 ; B -> bh1 + 1 }
```

- All children of red nodes are black.

```
prop_Rb4 :: RBT Int -> Bool
prop_Rb4 (Root t) = aux t where
aux :: CT c Int -> Bool
aux (N col l _ r) = aux l && aux r
&& if col %==% R
then (color l %==% B && color r %==% B)
else True
aux E = True
```

And satisfies the binary search tree condition.

```
prop_BST :: RBT Int -> Bool
prop_BST (Root t) = aux t where
aux :: CT c Int -> Bool
aux E = True
aux (N _ l x r) =
aux l && aux r &&
maybe True (x >) (tmax l) &&
maybe True (x <) (tmin r)
```

```
maybes :: (a -> a -> a) -> Maybe a -> Maybe a -> Maybe a
maybes f (Just x) (Just y) = Just (f x y)
maybes _ Nothing m = m
maybes _ m Nothing = m
```

```
tmin :: CT c Int -> Maybe Int
tmin E = Nothing
tmin (N _ l x r) =
maybes min (tmin l) (tmin r)
```

```
tmax :: CT c Int -> Maybe Int
tmax E = Nothing
tmax (N _ l x r) =
maybes max (tmax l) (tmax r)
```

```
rbt :: Proxy RBT
rbt = Proxy
```

```
main :: IO ()
main = do
quickCheck prop_BST
quickCheck $ prop_empty rbt
quickCheck $ prop_insert1 rbt
quickCheck $ prop_insert2 rbt
-- quickCheck prop_Rb1
-- quickCheck prop_Rb2
quickCheck prop_Rb3
quickCheck prop_Rb4
```

## Notes

[0] See also persistant Java implementation for comparison. Requires ~350 lines for the same implementation.

[1] Andrew Appel, "Efficient Verified Red-Black Trees" September 2011. Presents a Coq implementation of a verified Red Black Tree.

[2] Matt Might has a blog post on the RBT deletion operation in Racket

## News :

Welcome to CIS 552!

See the home page for basic
information about the course, the schedule for the lecture notes
and assignments, the resources for links to the required software
and online references, and the syllabus for detailed information about
the course policies.