# Red-Black Trees

Adds the 'Valid' type class to ensure that red nodes have black children. This type class as added as a constraint to the `N`

constructor of the `CT`

to enforce the relationship between the color of the node and the colors of its children.

The difficult part of this extension is the definition of `balanceR`

and `balanceL`

. Instead of using a *catch-all* case for trees that do not need rebalancing, we need to explicitly specify the patterns of good trees, so that the type checker can certify that they do not need to be rebalanced.

`{-# LANGUAGE GADTs, DataKinds, KindSignatures, MultiParamTypeClasses, FlexibleInstances #-}`

`module RedBlack2 where`

`import Persistent`

```
import Control.Monad
import Test.QuickCheck
import Data.Maybe as Maybe
```

- A red-black tree is a binary search tree where every node is additionally marked with a color (red or black) and in which the following invariants are maintained:

The empty nodes at the leaves are black

The root is always black

From each node, every path to a leaf has the same number of black nodes

Red nodes have black children

The first invariant will be true by definition, the others we will

have to maintain as we implement the tree.

## Invariants

Together, these invariants imply that every red-black tree is "approximately balanced", in the sense that the longest path to an empty node is no more than twice the length of the shortest.

From this, it follows that all operations will run in O(log_2 n) time.

## Implementation

The data declaration is a straightforward modification of the one for unbalanced trees:

`data Color = Red | Black deriving (Eq, Show)`

```
data SColor c where
R :: SColor Red
B :: SColor Black
```

```
(%==%) :: SColor c1 -> SColor c2 -> Bool
R %==% R = True
B %==% B = True
_ %==% _ = False
```

A colored tree, where the index c indicates the color of the top node of the tree.

```
class Valid (parent :: Color) (left :: Color) (right :: Color)
instance Valid Red Black Black
instance Valid Black lc rc
```

```
data CT c a where
E :: CT Black a
N :: Valid c c1 c2 => (SColor c) -> (CT c1 a) -> a -> (CT c2 a) -> CT c a
```

`data RBT a = Root (CT Black a) deriving Show`

```
instance Show (SColor c) where
show R = "R"
show B = "B"
```

```
instance Show a => Show (CT c a) where
show E = "E"
show (N c l x r) = "(N " ++ show c ++ " " ++ show l ++ " " ++ show x ++ " " ++ show r ++ ")"
```

For any red/black tree, we can determine its color

```
color :: CT c a -> SColor c
color E = B
color (N c _ _ _) = c
```

We then just need to implement the method of the Set class for this data structure.

`instance Set RBT where`

The empty tree is the same as before.

```
-- empty :: RBT a
empty = Root E
```

Membership testing and listing the elements requires just a trivial change.

```
-- member :: Ord a => a -> RBT a -> Bool
member x (Root t) = mem x t where
mem :: Ord a => a -> CT c a -> Bool
mem x E = False
mem x (N _ a y b)
| x < y = mem x a
| x > y = mem x b
| otherwise = True
```

```
-- elements :: Ord a => RBT a -> [a]
elements (Root t) = aux t [] where
aux :: Ord a => CT c a -> [a] -> [a]
aux E xs = xs
aux (N _ a y b) xs = aux a (y : aux b xs)
```

Insertion, is, of course a bit trickier. It uses an auxiliary function, `aux`

, that returns a tree that is a correct red black tree, *except* that the root could be red. Therefore, insert includes an additional step to blacken the top node of the tree.

```
insert x (Root t) = blacken (ins x t) where
blacken (NN _ l y r) = (Root (N B l y r))
-- blacken E = error "can't happen"
```

Insertion is implemented in terms of a recursive function ins, which walks down the tree until...

```
data Node a where
NN :: SColor c -> CT c1 a -> a -> CT c2 a -> Node a
```

`ins :: Ord a => a -> CT c a -> Node a`

... it gets to an empty leaf node, in which case it constructs a new (red) node containing the value being inserted...

`ins x E = NN R E x E`

... or discovers that the value being inserted is already in the tree, in which case it returns the input unchanged:

```
ins x s@(N c a y b)
| x < y = balanceL c (ins x a) y b
| x > y = balanceR c a y (ins x b)
| otherwise = NN c a y b
```

In the recursive calls, before returning the new tree, however, we may need to *rebalance* to maintain the red-black invariants. The code to do this is encapsulated in a helper function `balance`

.

## Balancing

The key insight in writing the balancing function is that we do not try to rebalance as soon as we see a red node with a red child. Instead, we return this tree as-is and only rebalance after we've inserted the new red node between the black parent and the red child.

i.e., the job of the balance function is to rebalance trees with a black-red-red path starting at the root.

The result of rebalancing maintains the black height by converting to a red parent with multiple black children.

Since the root has two children and four grandchildren, there are four ways in which such a path can happen.

[ pictures on the chalkboard ]

```
balanceL :: Ord a => SColor c -> Node a -> a -> CT c1 a -> Node a
balanceL B (NN R (N R a x b) y c) z d =
NN R (N B a x b) y (N B c z d)
balanceL B (NN R a x (N R b y c)) z d =
NN R (N B a x b) y (N B c z d)
```

-- everything works out. Patterns that are not B/R/R

```
balanceL col (NN B a x b) y c = NN col (N B a x b) y c
balanceL col (NN R a@(N B _ _ _) x b@(N B _ _ _)) y c = NN col (N R a x b) y c
balanceL col (NN R a@E x b@(N B _ _ _)) y c = NN col (N R a x b) y c
balanceL col (NN R a@(N B _ _ _) x b@E) y c = NN col (N R a x b) y c
balanceL col (NN R a@E x b@E) y c = NN col (N R a x b) y c
```

```
balanceR :: Ord a => SColor c -> CT c1 a -> a -> Node a -> Node a
balanceR B a x (NN R (N R b y c) z d) =
NN R (N B a x b) y (N B c z d)
balanceR B a x (NN R b y (N R c z d)) =
NN R (N B a x b) y (N B c z d)
```

```
balanceR col a x (NN B b y c) = NN col a x (N B b y c)
balanceR col a x (NN R b@(N B _ _ _) y c@(N B _ _ _)) = NN col a x (N R b y c)
balanceR col a x (NN R b@E y c@(N B _ _ _)) = NN col a x (N R b y c)
balanceR col a x (NN R b@(N B _ _ _) y c@E) = NN col a x (N R b y c)
balanceR col a x (NN R b@E y c@E) = NN col a x (N R b y c)
```

## Making Sure Invariants are Preserved

We'll use quickcheck to test our tree implementation.

```
instance (Ord a, Arbitrary a) => Arbitrary (RBT a) where
arbitrary = liftM (foldr insert empty) arbitrary
```

Now consider the red black tree invariants:

- The empty nodes at the leaves are black.

```
-- prop_Rb1 :: Bool
-- prop_Rb1 = color E == B
```

- The root of the tree is Black.

```
-- prop_Rb2 :: RBT Int -> Bool
-- prop_Rb2 t = color t == B
```

- For all nodes in the tree, all downward paths from the node to a leaf contain the same number of Black nodes.

```
prop_Rb3 :: RBT Int -> Bool
prop_Rb3 (Root t) = Maybe.isJust (aux t) where
aux :: CT c Int -> Maybe Int
aux E = Just 1
aux (N c l _ r) = do
bh1 <- aux l
bh2 <- aux r
guard (bh1 == bh2)
return $ case c of { R -> bh1 ; B -> bh1 + 1 }
```

- All children of red nodes are black.

```
prop_Rb4 :: RBT Int -> Bool
prop_Rb4 (Root t) = aux t where
aux :: CT c Int -> Bool
aux (N col l _ r) = aux l && aux r
&& if col %==% R
then (color l %==% B && color r %==% B)
else True
aux E = True
```

And satisfies the binary search tree condition.

```
prop_BST :: RBT Int -> Bool
prop_BST (Root t) = aux t where
aux :: CT c Int -> Bool
aux E = True
aux (N _ l x r) =
aux l && aux r &&
maybe True (x >) (tmax l) &&
maybe True (x <) (tmin r)
```

```
maybes :: (a -> a -> a) -> Maybe a -> Maybe a -> Maybe a
maybes f (Just x) (Just y) = Just (f x y)
maybes _ Nothing m = m
maybes _ m Nothing = m
```

```
tmin :: CT c Int -> Maybe Int
tmin E = Nothing
tmin (N _ l x r) =
maybes min (tmin l) (tmin r)
```

```
tmax :: CT c Int -> Maybe Int
tmax E = Nothing
tmax (N _ l x r) =
maybes max (tmax l) (tmax r)
```

```
rbt :: Proxy RBT
rbt = Proxy
```

```
main :: IO ()
main = do
quickCheck prop_BST
quickCheck $ prop_empty rbt
quickCheck $ prop_insert1 rbt
quickCheck $ prop_insert2 rbt
-- quickCheck prop_Rb1
-- quickCheck prop_Rb2
quickCheck prop_Rb3
quickCheck prop_Rb4
```

## Notes

[0] See also persistant Java implementation for comparison. Requires ~350 lines for the same implementation.

[1] Andrew Appel, "Efficient Verified Red-Black Trees" September 2011. Presents a Coq implementation of a verified Red Black Tree.

[2] Matt Might has a blog post on the RBT deletion operation in Racket

## News :

Welcome to CIS 552!

See the home page for basic
information about the course, the schedule for the lecture notes
and assignments, the resources for links to the required software
and online references, and the syllabus for detailed information about
the course policies.