Red-Black Trees
================
Using The Haskell type system to guarantee data structure invariants.
This file is a version of the [RedBlack](RedBlack.html) tree
implementation that we presented in class, modified to enforce two of the
red black tree invariants at compile time.
> {-# LANGUAGE GADTs, DataKinds, KindSignatures, MultiParamTypeClasses, FlexibleInstances #-}
> module RedBlack1 where
> import Persistent
> import Control.Monad
> import Test.QuickCheck
> import Data.Maybe as Maybe
* A red-black tree is a binary search tree where every node is
additionally marked with a color (red or black) and in which the
following invariants are maintained:
1. The empty nodes at the leaves are black
*enforced by the type system*
2. The root is always black
*enforced by the type system*
3. From each node, every path to a leaf
has the same number of black nodes
4. Red nodes have black children
Invariants
----------
* Together, these invariants imply that every red-black tree is
"approximately balanced", in the sense that the longest path to an
empty node is no more than twice the length of the shortest.
* From this, it follows that all operations will run in O(log_2 n)
time.
Implementation
--------------
The data declaration is a straightforward modification of the one for
unbalanced trees:
> data Color = Red | Black deriving (Eq, Show)
This is a GADT---a type that is indexed by another type. Each data
constructor determines what the type parameter c must be.
> data SColor (c :: Color) where
> R :: SColor Red
> B :: SColor Black
An equality test for singleton colors.
> (%==%) :: SColor c1 -> SColor c2 -> Bool
> R %==% R = True
> B %==% B = True
> _ %==% _ = False
A colored tree, where the index c indicates the color of the top node
of the tree.
> class Valid (croot :: Color) (cl::Color) (cr :: Color)
> instance Valid Red Black Black
> instance Valid Black c1 c2
> data CT c a where
> E :: CT Black a -- empty leaves must be black
> N :: Valid c c1 c2 => (SColor c) -> (CT c1 a) -> a -> (CT c2 a) -> CT c a
A *top-level definition* that enforces that the root of the tree is black.
> data RBT a = Root (CT Black a) deriving Show
Haskell cannot derive the show instances for `SColor` and `CT` now that
they are indexed.
> instance Show (SColor c) where
> show R = "R"
> show B = "B"
> instance Show a => Show (CT c a) where
> show E = "E"
> show (N c l x r) = "(N " ++ show c ++ " " ++ show l ++ " " ++ show x ++ " " ++ show r ++ ")"
For any red/black tree, we can determine its color. Note that the
color must match the type index.
> color :: CT c a -> SColor c
> color E = B
> color (N c _ _ _) = c
We then just need to implement the method of the
Set class for this data structure.
> instance Set RBT where
The empty tree is the same as before.
> -- empty :: RBT a
> empty = Root E
Membership testing and listing the elements
requires just a trivial change.
> -- member :: Ord a => a -> RBT a -> Bool
> member x (Root t) = mem x t where
> mem :: Ord a => a -> CT c a -> Bool
> mem x E = False
> mem x (N _ a y b)
> | x < y = mem x a
> | x > y = mem x b
> | otherwise = True
> -- elements :: Ord a => RBT a -> [a]
> elements (Root t) = aux t [] where
> aux :: Ord a => CT c a -> [a] -> [a]
> aux E xs = xs
> aux (N _ a y b) xs = aux a (y : aux b xs)
Insertion, is, of course a bit trickier. It uses an auxiliary
function, `aux`, that returns a tree that is a correct red black tree,
*except* that the root could be red. Therefore, insert includes an
additional step to blacken the top node of the tree.
> insert x (Root t) = blacken (ins x t) where
> blacken (NN _ l y r) = (Root (N B l y r))
> -- blacken E = error "can't happen"
Insertion is implemented in terms of a recursive function
ins, which returns a result of type
> data Node a where
> NN :: SColor c -> CT c1 a -> a -> CT c2 a -> Node a
and walks down the tree until...
> ins :: Ord a => a -> CT c a -> Node a
... it gets to an empty leaf node, in which case
it constructs a new (red) node containing the
value being inserted...
> ins x E = NN R E x E
... or discovers that the value being inserted is
already in the tree, in which case it returns
the input unchanged:
> ins x s@(N c a y b)
> | x < y = balanceL c (ins x a) y b
> | x > y = balanceR c a y (ins x b)
> | otherwise = NN c a y b
In the recursive calls, before returning the new tree, however, we may
need to *rebalance* to maintain the red-black invariants. The code to
do this is encapsulated in a helper function `balance`.
Balancing
---------
* The key insight in writing the balancing function is that we do not
try to rebalance as soon as we see a red node with a red
child. Instead, we return this tree as-is and only rebalance after
we've inserted the new red node between the black parent and the red
child.
* i.e., the job of the balance function is to rebalance trees with a
black-red-red path starting at the root.
* The result of rebalancing maintains the black height by converting
to a red parent with multiple black children.
* Since the root has two children and four grandchildren, there are
four ways in which such a path can happen.
[ pictures on the chalkboard ]
> balanceL :: Ord a => SColor c -> Node a -> a -> CT c1 a -> Node a
> balanceL B (NN R (N R a x b) y c) z d =
> NN R (N B a x b) y (N B c z d)
> balanceL B (NN R a x (N R b y c)) z d =
> NN R (N B a x b) y (N B c z d)
> balanceL col (NN B a x b) y c = NN col (N B a x b) y c
> balanceL col (NN R a@E x b@E) y c = NN col (N R a x b) y c
> balanceL col (NN R a@(N B _ _ _) x b@E) y c = NN col (N R a x b) y c
> balanceL col (NN R a@E x b@(N B _ _ _)) y c = NN col (N R a x b) y c
> balanceL col (NN R a@(N B _ _ _) x b@(N B _ _ _)) y c = NN col (N R a x b) y c
> balanceR :: Ord a => SColor c -> CT c1 a -> a -> Node a -> Node a
> balanceR B a x (NN R (N R b y c) z d) =
> NN R (N B a x b) y (N B c z d)
> balanceR B a x (NN R b y (N R c z d)) =
> NN R (N B a x b) y (N B c z d)
> -- balanceR col a x (NN col' b y c) = NN col a x (N col' b y c)
Making Sure Invariants are Preserved
------------------------------------
We'll use quickcheck to test our tree implementation.
> instance (Ord a, Arbitrary a) => Arbitrary (RBT a) where
> arbitrary = liftM (foldr insert empty) arbitrary
Now consider the red black tree invariants:
1. The empty nodes at the leaves are black.
> -- prop_Rb1 :: Bool
> -- prop_Rb1 = color E == B
2. The root of the tree is Black.
> -- prop_Rb2 :: RBT Int -> Bool
> -- prop_Rb2 t = color t == B
3. For all nodes in the tree, all downward paths from the
node to a leaf contain the same number of Black nodes.
> prop_Rb3 :: RBT Int -> Bool
> prop_Rb3 (Root t) = Maybe.isJust (aux t) where
> aux :: CT c Int -> Maybe Int
> aux E = Just 1
> aux (N c l _ r) = do
> bh1 <- aux l
> bh2 <- aux r
> guard (bh1 == bh2)
> return $ case c of { R -> bh1 ; B -> bh1 + 1 }
4. All children of red nodes are black.
> prop_Rb4 :: RBT Int -> Bool
> prop_Rb4 (Root t) = aux t where
> aux :: CT c Int -> Bool
> aux (N col l _ r) = aux l && aux r
> && if col %==% R
> then (color l %==% B && color r %==% B)
> else True
> aux E = True
And satisfies the binary search tree condition.
> prop_BST :: RBT Int -> Bool
> prop_BST (Root t) = aux t where
> aux :: CT c Int -> Bool
> aux E = True
> aux (N _ l x r) =
> aux l && aux r &&
> maybe True (x >) (tmax l) &&
> maybe True (x <) (tmin r)
> maybes :: (a -> a -> a) -> Maybe a -> Maybe a -> Maybe a
> maybes f (Just x) (Just y) = Just (f x y)
> maybes _ Nothing m = m
> maybes _ m Nothing = m
> tmin :: CT c Int -> Maybe Int
> tmin E = Nothing
> tmin (N _ l x r) =
> maybes min (tmin l) (tmin r)
> tmax :: CT c Int -> Maybe Int
> tmax E = Nothing
> tmax (N _ l x r) =
> maybes max (tmax l) (tmax r)
> rbt :: Proxy RBT
> rbt = Proxy
> main :: IO ()
> main = do
> quickCheck prop_BST
> quickCheck $ prop_empty rbt
> quickCheck $ prop_insert1 rbt
> quickCheck $ prop_insert2 rbt
> -- quickCheck prop_Rb1
> -- quickCheck prop_Rb2
> quickCheck prop_Rb3
> quickCheck prop_Rb4
Notes
-----
[0] See also persistant [Java
implementation](http://wiki.edinburghhacklab.com/PersistentRedBlackTreeSet)
for comparison. Requires ~350 lines for the same implementation.
[1] Andrew Appel, ["Efficient Verified Red-Black Trees"](http://www.cs.princeton.edu/~appel/papers/redblack.pdf)
September 2011. Presents a Coq implementation of
a verified Red Black Tree.
[2] Matt Might has a blog post on the [RBT deletion operation in
Racket](http://matt.might.net/articles/red-black-delete/)