Red-Black Trees
================
Adds the 'Valid' type class to ensure that red nodes have black children.
Tracks the black height statically using a type-level function, called Incr.
This file adds a natural number index added to the datatypes `CT` and
`Node` to statically track the black height of the tree at that
point. In the `N` and `NN` constructors respectively, the left and
right children must have identical black heights, which is then
conditionally incremented based on the color of the node.
> {-# LANGUAGE GADTs, DataKinds, KindSignatures, MultiParamTypeClasses, FlexibleInstances, TypeFamilies #-}
> module RedBlack3 where
> import Persistent
> import Control.Monad
> import Test.QuickCheck
> import Data.Maybe as Maybe
* A red-black tree is a binary search tree where every node is
additionally marked with a color (red or black) and in which the
following invariants are maintained:
1. The empty nodes at the leaves are black
2. The root is always black
3. From each node, every path to a leaf
has the same number of black nodes
4. Red nodes have black children
The first invariant will be true by definition, the others we will
have to maintain as we implement the tree.
Invariants
----------
* Together, these invariants imply that every red-black tree is
"approximately balanced", in the sense that the longest path to an
empty node is no more than twice the length of the shortest.
* From this, it follows that all operations will run in O(log_2 n)
time.
Implementation
--------------
The data declaration is a straightforward modification of the one for
unbalanced trees:
> data Color = Red | Black deriving (Eq, Show)
> data Nat = Z | S Nat deriving (Eq, Show)
> data SColor c where
> R :: SColor Red
> B :: SColor Black
> (%==%) :: SColor c1 -> SColor c2 -> Bool
> R %==% R = True
> B %==% B = True
> _ %==% _ = False
A colored tree, where the index c indicates the color of the top node
of the tree.
> class Valid (parent :: Color) (left :: Color) (right :: Color)
> instance Valid Red Black Black
> instance Valid Black lc rc
> type family Incr (c :: Color) (bh :: Nat) :: Nat
> type instance Incr Black bh = S bh
> type instance Incr Red bh = bh
> data CT bh c a where
> E :: CT Z Black a
> N :: Valid c c1 c2 => (SColor c) -> (CT bh c1 a) -> a -> (CT bh c2 a) -> CT (Incr c bh) c a
> data RBT a where
> Root :: (CT bh Black a) -> RBT a
> instance Show (SColor c) where
> show R = "R"
> show B = "B"
> instance Show a => Show (CT bh c a) where
> show E = "E"
> show (N c l x r) = "(N " ++ show c ++ " " ++ show l ++ " " ++ show x ++ " " ++ show r ++ ")"
> instance Show a => Show (RBT a) where
> show (Root t) = "(Root " ++ show t ++ ")"
For any red/black tree, we can determine its color
> color :: CT bh c a -> SColor c
> color E = B
> color (N c _ _ _) = c
We then just need to implement the method of the
Set class for this data structure.
> instance Set RBT where
The empty tree is the same as before.
> -- empty :: RBT a
> empty = Root E
Membership testing and listing the elements
requires just a trivial change.
> -- member :: Ord a => a -> RBT a -> Bool
> member x (Root t) = mem x t where
> mem :: Ord a => a -> CT bh c a -> Bool
> mem x E = False
> mem x (N _ a y b)
> | x < y = mem x a
> | x > y = mem x b
> | otherwise = True
> -- elements :: Ord a => RBT a -> [a]
> elements (Root t) = aux t [] where
> aux :: Ord a => CT bh c a -> [a] -> [a]
> aux E xs = xs
> aux (N _ a y b) xs = aux a (y : aux b xs)
Insertion, is, of course a bit trickier. It uses an auxiliary
function, `aux`, that returns a tree that is a correct red black tree,
*except* that the root could be red. Therefore, insert includes an
additional step to blacken the top node of the tree.
> insert x (Root t) = blacken (ins x t) where
> blacken (NN _ l y r) = (Root (N B l y r))
> -- blacken E = error "can't happen"
Insertion is implemented in terms of a recursive function
ins, which walks down the tree until...
> data Node (n :: Nat) a where
> NN :: SColor c -> CT bh c1 a -> a -> CT bh c2 a -> Node (Incr c bh) a
> ins :: Ord a => a -> CT bh c a -> Node bh a
... it gets to an empty leaf node, in which case
it constructs a new (red) node containing the
value being inserted...
> ins x E = NN R E x E
... or discovers that the value being inserted is
already in the tree, in which case it returns
the input unchanged:
> ins x s@(N c a y b)
> | x < y = balanceL c (ins x a) y b
> | x > y = balanceR c a y (ins x b)
> | otherwise = NN c a y b
In the recursive calls, before returning the new tree, however, we may
need to *rebalance* to maintain the red-black invariants. The code to
do this is encapsulated in a helper function `balance`.
Balancing
---------
* The key insight in writing the balancing function is that we do not
try to rebalance as soon as we see a red node with a red
child. Instead, we return this tree as-is and only rebalance after
we've inserted the new red node between the black parent and the red
child.
* i.e., the job of the balance function is to rebalance trees with a
black-red-red path starting at the root.
* The result of rebalancing maintains the black height by converting
to a red parent with multiple black children.
* Since the root has two children and four grandchildren, there are
four ways in which such a path can happen.
[ pictures on the chalkboard ]
> balanceL :: Ord a => SColor c -> Node bh a -> a -> CT bh c1 a -> Node (Incr c bh) a
> balanceL B (NN R (N R a x b) y c) z d =
> NN R (N B a x b) y (N B c z d)
> balanceL B (NN R a x (N R b y c)) z d =
> NN R (N B a x b) y (N B c z d)
-- everything works out. Patterns that are not B/R/R
> balanceL col (NN B a x b) y c = NN col (N B a x b) y c
> balanceL col (NN R a@(N B _ _ _) x b@(N B _ _ _)) y c = NN col (N R a x b) y c
> -- balanceL col (NN R a@E x b@(N B _ _ _)) y c = NN col (N R a x b) y c
> -- balanceL col (NN R a@(N B _ _ _) x b@E) y c = NN col (N R a x b) y c
> balanceL col (NN R a@E x b@E) y c = NN col (N R a x b) y c
> balanceR :: Ord a => SColor c -> CT bh c1 a -> a -> Node bh a -> Node (Incr c bh) a
> balanceR B a x (NN R (N R b y c) z d) =
> NN R (N B a x b) y (N B c z d)
> balanceR B a x (NN R b y (N R c z d)) =
> NN R (N B a x b) y (N B c z d)
> balanceR col a x (NN B b y c) = NN col a x (N B b y c)
> balanceR col a x (NN R b@(N B _ _ _) y c@(N B _ _ _)) = NN col a x (N R b y c)
> -- balanceR col a x (NN R b@E y c@(N B _ _ _)) = NN col a x (N R b y c)
> -- balanceR col a x (NN R b@(N B _ _ _) y c@E) = NN col a x (N R b y c)
> balanceR col a x (NN R b@E y c@E) = NN col a x (N R b y c)
Making Sure Invariants are Preserved
------------------------------------
We'll use quickcheck to test our tree implementation.
> instance (Ord a, Arbitrary a) => Arbitrary (RBT a) where
> arbitrary = liftM (foldr insert empty) arbitrary
Now consider the red black tree invariants:
1. The empty nodes at the leaves are black.
> -- prop_Rb1 :: Bool
> -- prop_Rb1 = color E == B
2. The root of the tree is Black.
> -- prop_Rb2 :: RBT Int -> Bool
> -- prop_Rb2 t = color t == B
3. For all nodes in the tree, all downward paths from the
node to a leaf contain the same number of Black nodes.
> prop_Rb3 :: RBT Int -> Bool
> prop_Rb3 (Root t) = Maybe.isJust (aux t) where
> aux :: CT bh c Int -> Maybe Int
> aux E = Just 1
> aux (N c l _ r) = do
> bh1 <- aux l
> bh2 <- aux r
> guard (bh1 == bh2)
> return $ case c of { R -> bh1 ; B -> bh1 + 1 }
4. All children of red nodes are black.
> prop_Rb4 :: RBT Int -> Bool
> prop_Rb4 (Root t) = aux t where
> aux :: CT bh c Int -> Bool
> aux (N col l _ r) = aux l && aux r
> && if col %==% R
> then (color l %==% B && color r %==% B)
> else True
> aux E = True
And satisfies the binary search tree condition.
> prop_BST :: RBT Int -> Bool
> prop_BST (Root t) = aux t where
> aux :: CT bh c Int -> Bool
> aux E = True
> aux (N _ l x r) =
> aux l && aux r &&
> maybe True (x >) (tmax l) &&
> maybe True (x <) (tmin r)
> maybes :: (a -> a -> a) -> Maybe a -> Maybe a -> Maybe a
> maybes f (Just x) (Just y) = Just (f x y)
> maybes _ Nothing m = m
> maybes _ m Nothing = m
> tmin :: CT bh c Int -> Maybe Int
> tmin E = Nothing
> tmin (N _ l x r) =
> maybes min (tmin l) (tmin r)
> tmax :: CT bh c Int -> Maybe Int
> tmax E = Nothing
> tmax (N _ l x r) =
> maybes max (tmax l) (tmax r)
> rbt :: Proxy RBT
> rbt = Proxy
> main :: IO ()
> main = do
> quickCheck prop_BST
> quickCheck $ prop_empty rbt
> quickCheck $ prop_insert1 rbt
> quickCheck $ prop_insert2 rbt
> -- quickCheck prop_Rb1
> -- quickCheck prop_Rb2
> -- quickCheck prop_Rb3
> -- quickCheck prop_Rb4
Notes
-----
[0] See also persistant [Java
implementation](http://wiki.edinburghhacklab.com/PersistentRedBlackTreeSet)
for comparison. Requires ~350 lines for the same implementation.
[1] Andrew Appel, ["Efficient Verified Red-Black Trees"](http://www.cs.princeton.edu/~appel/papers/redblack.pdf)
September 2011. Presents a Coq implementation of
a verified Red Black Tree.
[2] Matt Might has a blog post on the [RBT deletion operation in
Racket](http://matt.might.net/articles/red-black-delete/)