# Persistent Data Structures

`{-# LANGUAGE KindSignatures, ScopedTypeVariables #-}`

`module Persistent where`

```
import Control.Monad
import Test.QuickCheck hiding (elements)
import Data.Maybe as Maybe
import Data.List (sort,nub)
```

## Persistent vs. Ephemeral

- An
*ephemeral*data structure is one for which only one version is available at a time: after an update operation, the structure as it existed before the update is lost.

For example, an array is ephemeral. After locations written to in the array are updated, the old information is no longer available.

- A
*persistent*structure is one where multiple version are simultaneously accessible: after an update, both old and new versions are available. For example, a binary tree can be implemented persistently, so that after insertion, the old value of the tree is still available.

Persistent data structures can be more expensive than their ephemeral counterparts in terms of computational complexity, but that cost is often small compared to their benefits:

- better integration with concurrent programming, naturally lock-free
- simpler, more-declarative programming
- better semantics for equality/hashing/etc.
- access to
*all*old versions (git for everything)

We'll talk about the implementation of some *simple* persistent data structures in class. These lectures demonstrate that functional programming is adept at implementing sophisticated data structures. In particular, datatypes and pattern matching make the implementation of persistent tree-like data structures remarkably straightforward. These examples are drawn from Chris Okasaki's excellent book Purely Functional Data Structures.

However, we'll only scratch the surface. There are many industrial strength persistent data structures out there.

- Finger trees/Ropes, see Data.Sequence
- Size balanced trees, see Data.Map
- Big-endian patricia trees, see Data.IntMap
- Hash array mapped tries, used in the Clojure language
- and many more

# A Set interface

Let's think about what the interface to a persistent set should look like. We can tell that this implementation is persistent just by looking at the types of the operations.

```
class Set s where
empty :: s a
member :: Ord a => a -> s a -> Bool
insert :: Ord a => a -> s a -> s a
elements :: Ord a => s a -> [a]
```

For example, one trivial implementation of sets is with lists.

```
instance Set [] where
empty = undefined
member = undefined
insert = undefined
elements = undefined
```

When we define an abstract data structure like `Set`

above, we should also specify properties that *all* implementations should satisfy.

For each of these properties, we will use a `Proxy`

argument to tell quickcheck exactly which implementation it should be testing. We could use a type annotation instead (except for `prop_empty`

) but the `Proxy`

argument is a little bit easier to use.

`data Proxy (s :: * -> *) = Proxy`

For example, we can define a proxy for the list type.

```
list :: Proxy []
list = Proxy
```

The empty set has no elements.

```
prop_empty :: forall s. (Set s) => Proxy s -> Bool
prop_empty _ = null (elements (empty :: s Int))
```

The elements of the set are sorted, and all of them are stored in the tree.

```
prop_elements :: (Set s) => Proxy s -> s Int -> Bool
prop_elements _ x = elements x == nub (sort (elements x)) &&
all (\y -> member y x) (elements x)
```

When we insert an element in the tree, we want to make sure that it is contained in the result.

```
prop_insert1 :: (Set s) => Proxy s -> Int -> s Int -> Bool
prop_insert1 _ x t = member x (insert x t)
```

And that the new tree also contains all of the original elements.

```
prop_insert2 :: (Set s) => Proxy s -> Int -> s Int -> Bool
prop_insert2 _ x t = all (\y -> member y t') (elements t) where
t' = insert x t
```

*Persistent> quickCheck $ prop_empty list*Persistent> quickCheck $ prop_elements list *Persistent> quickCheck $ prop_insert1 list*Persistent> quickCheck $ prop_insert2 list

# Binary Search Trees

See Binary Search Trees and their implementation BST.lhs

# Balanced Trees

If our sets grow large, we may find that the simple binary tree implementation is not fast enough: in the worse case, each insert or member operation may take O(n) time!

We can do much better by keeping the trees balanced.

There are many ways of doing this. Letâ€™s look at one fairly simple (but still very fast) one that you may have seen before in an imperative setting:

*red-black trees*.

## News :

Welcome to CIS 552!

See the home page for basic
information about the course, the schedule for the lecture notes
and assignments, the resources for links to the required software
and online references, and the syllabus for detailed information about
the course policies.