# A Poor Man's Concurrency Monad

`{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses,UndecidableInstances, FlexibleContexts, TypeSynonymInstances #-}`

`module Concurrency where`

```
import Data.Monoid
import Control.Monad
import Control.Monad.Writer
import Control.Monad.Trans
import System.IO
import Data.IORef
import System.Environment (getArgs)
```

To use this module, must install the "Network" package with

cabal install network

`import Network (withSocketsDo, PortNumber, PortID(..), listenOn, accept)`

This lecture talks about embedding concurrency in a single-threaded language. It isn't the same as "real" concurrency, where multiprocessors run in parallel. Instead it is "time-sharing" concurrency, where multiple threads of execution share the same single, uniprocessor. These threads are not pre-emptive, they will run as long as they wish, only relinquishing control at certain points.

The basis for this lecture is a paper published in 1999, a time when laptops and personal computers were uniformly uniprocessors.

`Koen Claessen, A Poor Man's Concurrency Monad, JFP 1999.`

## Why would you want such concurrency, today?

To hide latency, i.e. non-blocking File IO. Disk access is slow, and the processor could be doing useful stuff while waiting for the disk to return.

To express concurrent applications in a single threaded language, such as Python or JavaScript. The world is naturally concurrent. A processor has many forms of interaction, from the network, to the disk, to the display, to the keyboard/mouse, etc, and a single application may wish to support these interactions, even while doing perhaps long-running computations.

To avoid locks/etc. Since this is "time-sharing" concurrency, which is going to be run on a single processor, we know that each thread could be interrupted

*only*at specific points. So we don't need locking to guarantee about atomicity, significantly simplifying concurrent programming.Furthermore, this library is all user code, so it allows

*you*access to concurrency "internals" such as the thread-scheduling algorithm. If your application requires specific control over how thread scheduling works, then you can modify the implementation of the monad.

## How do we simulate concurrency on a uniprocessor?

The standard technique is *interleaving*, i.e. running first part of one thread, then suspending it, and then allowing another thread to run.

To suspend a thread we need access to its "future computation", in programming language terminology, this is often referred to as a *continutation*.

By programming in Continuation-passing style (CPS), we ensure that all computations have access to their continuations. We can make programming in CPS more convenient by packaging the continuation in a monad.

computation in the Concurrency monad is a function whose first argument is its continuation, i.e. a function that the computation will call when it wants to "return" its value, i.e. pass it to the next step in the computation.

We're going to divide up computations into slices called actions, so all computations in this monad should themselves be `Action`

s, and their continuations should likewise produce new `Action`

s. An action can be an atomic action, a concurrency operation (such as forking a new action), or the terminal action that halts the thread. We'll get time sharing by running actions from multiple threads in a round-robin fashion.

```
type Action =
Atom Action -- do an atomic computation, returning a new action
| Fork Action Action -- fork computation into two different actions
| Stop -- halt the computation
```

Putting this all together, we get the following definition for the concurrency monad:

`type C a = (a -> Action) -> Action`

We can build up larger computations using bind and return.

To sequence computations, we first abstract the current continuation k, then run the first computation m, giving it a contination that next runs f with the value produced by m and the original continuation.

When we are done, we 'return' the final answer by giving it to the continuation.

```
type C a = (a -> Action) -> Action
instance Monad C where
(>>=) :: C a -> (a -> C b) -> C b
m >>= f = \ k -> m ( \ a -> f a k )
return :: a -> C a
return x = \ k -> k x
```

However, there is one more wrinkle. We want to parameterize the Action type by a monad so that the atomic computation can be monadic. This is necessary, for example, if we want to do IO operations.

```
data Action m =
Atom (m (Action m)) -- do some work
| Fork (Action m) (Action m) -- create a new thread
| Stop -- finish the thread
```

Therefore, our definition of the monad takes is also parameterized by 'm', this underlying monad of atomic computations.

`newtype C m a = C { runC :: (a -> Action m) -> Action m }`

We can show that `C m`

is a monad using essentially the same code as above, after taking account of the newtype constructor.

```
instance Monad (C m) where
m >>= f = undefined
return x = undefined
```

The `atom`

function turns an arbitrary computation in the monad m into an atomic action in `C m`

. An atomic action is one that that runs the monadic computation and then passes its result to the continuation. (We know that the monadic computation m will not be interrupted, that is why this is called "atomic".)

```
atom :: Monad m => m a -> C m a
atom m = undefined
```

The atom operation is what makes the Concurrency monad a monad transformer. It is how we lift operations from one monad into the concurrency monad.

```
instance MonadTrans C where
-- lift :: Monad m => m a -> C m a
lift = atom
```

We next define other concurrency operations.

The `stop`

function discards any continuation, ending a computation.

```
stop :: Monad m => C m a
stop = undefined
```

For concurrency, we have two variants. The first operation is symmetric and combines two concurrent computations into one by forking and passing the same continuation to each part.

```
par :: Monad m => C m a -> C m a -> C m a
par m1 m2 = undefined
```

The second operation is more similar to the standard fork. It turns its argument into a "top-level" action and then continues the current computation.

```
fork :: Monad m => C m () -> C m ()
fork m = undefined
```

Above, the operation `action`

transforms computations (of type `C m a`

) into actions. All it does is give the computation the top-level 'Stop' continuation, which terminates the thread execution.

```
action :: Monad m => C m a -> Action m
action m = undefined
```

# Running Threads

At any moment, the status of the computation is modelled by a collection of (concurrently running) threads. Each thread is represented by its current Action.

For simplicity, we'll represent that collection of threads as a list and, for round-robin scheduling, treat that list as a queue. (In a high-performance system we would use a more efficient queue implementation.)

The job of the thread scheduler is to run the threads in the queue.

If the first thread is an atomic action, the scheduler runs it to its next state. It then puts this the new state at the back of the queue and continues to process the rest of the queue.

If the action is a

`Fork`

the thread scheduler puts both sub-actions at the back of the queue and continues.If the action is

`Stop`

, that means that the current thread has finished its computation. The thread scheduler just continues with the rest of the threads in the queue.

```
sched :: Monad m => [Action m] -> m ()
sched [] = return ()
sched (a : as) = case a of
Atom am -> undefined
Fork a1 a2 -> undefined
Stop -> undefined
```

Running a computation is merely turning it into an action and then giving it to the thread scheduler.

```
run :: Monad m => C m a -> m ()
run m = sched [ action m ]
```

# Example - Concurrent Output

Now let's see some examples of concurrency! The first example involves concurrent output---two threads writing to the screen at the same time. To make sure that we get the full effect of concurrency, we'll first turn off buffering on the standard input and output sources:

Main*> hSetBuffering stdout NoBuffering Main*> hSetBuffering stdin NoBuffering

Next, because we are working with Monad Transformers, we need to define classes that characterize the operations of the monads that we wish to use. For example, consider a class of monads that support text output. These are the ones that have a `write`

operation.

```
class Monad m => Output m where
write :: String -> m ()
```

For example, we can make the 'IO' monad a member of this class.

```
instance Output IO where
write = putStr
```

Now, here is an infinite loop that just writes its argument.

```
loop :: Output m => String -> m ()
loop s = do write s
loop s
```

If we run this loop from the ghci toplevel (in the IO monad) we don't get to do anything else.

`*Main> loop "CIS 552"`

But with concurrency, we can make this loop run in parallel with other computations. To do that, we need to run loop in the concurrency monad. Therefore, we need to make the concurrency monad a member of the output class. Because 'C' is a monad transformer, that is not too difficult to do.

```
instance Output m => Output (C m) where
write s = lift (write s)
```

```
example :: Output m => C m ()
example = do write "start!"
fork (loop "dog\n")
loop "cat\n"
```

We run this computation by giving it to the scheduler.

`*Main> run example`

Note that our implementation of write for the concurrency monad determines how much interleaving is possible between two different simultaneous writes. Each use of lift creates an atomic action that cannot be interrupted.

```
-- instance Output m => Output (C m) where
-- write [] = lift (write [])
-- write (x:xs) = lift (write [x]) >> write xs
```

Alternatively, we can define a new operation that breaks up the write into character-sized chunks.

```
writeLetterByLetter :: (Output m) => String -> m ()
writeLetterByLetter (c:cs) = write [c] >> writeLetterByLetter cs
writeLetterByLetter [] = return ()
```

# Concurrent Input and Output

Now suppose we would like threads to read as well as write. To do that we need a class for *asyncronous* (i.e. nonblocking) input. The method in this class reads some input as long as there is some ready. If there is no input available, it immediately returns with `Nothing`

.

```
class Monad m => Input m where
input :: m (Maybe String)
```

To implement nonblocking input in the IO monad, we first test to see if the standard input is ready (using `hReady`

) before we use the standard blocking operation (`getLine`

).

```
instance Input IO where
input = do x <- hReady stdin
if x then liftM Just getLine else return Nothing
```

For example, we can write a loop that prints out a string until a line is entered in the keyboard.

```
ioloop :: (Input m, Output m) => String -> m String
ioloop s = do i <- input
case i of
Just x -> return ("Thread " ++ s ++ ":" ++ x)
Nothing -> do write s
ioloop s
```

`*Main> ioloop "CIS 552"`

We can run this thread concurrently with other threads by inserting the concurrency monad into the Input class.

```
instance Input m => Input (C m) where
input = lift input
```

```
example2 :: (Input m, Output m) => C m ()
example2 = do
fork (ioloop "a" >>= write)
(ioloop "b" >>= write)
```

```
example3 :: (Input m, Output m) => C m ()
example3 = do
x <- par (ioloop "a") (ioloop "b")
write "done!"
write x
```

# Shared State

Sometimes threads may wish to communicate with eachother by passing messages through some shared state. An abstraction designed for that purpose is a 'MVar'. A MVar is a potentially empty memory location. Initially the memory location is empty, but it can be updated to contain information. If the memory location is read, then the data is removed (atomically).

`type MVar a = IORef (Maybe a)`

```
class Monad m => MVarMonad m where
newMVar :: m (MVar a)
writeMVar :: MVar a -> a -> m ()
takeMVar :: MVar a -> m (Maybe a)
```

```
instance MVarMonad IO where
newMVar = newIORef Nothing
writeMVar v a = writeIORef v (Just a)
takeMVar v = do a <- readIORef v
writeIORef v Nothing
return a
```

```
instance MVarMonad m => MVarMonad (C m) where
newMVar = lift newMVar
writeMVar v a = lift (writeMVar v a)
takeMVar v = lift (takeMVar v)
```

Blocking message receiving. The function loops until the value of the reference changes. This operation *requires* concurrency to do anything interesting...

```
readMVar :: (MVarMonad m) => MVar a -> m a
readMVar v = do mv <- takeMVar v
case mv of
Just a -> return a
Nothing -> readMVar v
```

Now here is an example using message passing. We have two threads that communicate via messages. One thread will be running a "simulation", the other will be the interface.

First, we'll define a short language of messages to send back and forth....

```
data Msg =
Add | Reset | Print | Quit
```

The simulation just manages the value of an integer, either incrementing it, resetting it, or printing it based on the messages received from the interface.

```
simulation :: MVar Msg -> Integer -> C IO ()
simulation mv i = do
x <- takeMVar mv
case x of
Just Add -> do write "Adding...\n"
simulation mv (i+1)
Just Reset -> do write "Resetting...\n"
simulation mv 0
Just Print -> do write ("Current value is " ++ show i ++ "\n")
simulation mv i
Just Quit -> do write ("Done\n")
Nothing -> simulation mv i
```

The interface reads keys from the keyboard and parses them into messages for the simulation.

```
interface :: MVar Msg -> C IO (Maybe String) -> C IO ()
interface mv getInput = loop where
loop = do
maybeKey <- getInput
case maybeKey of
Just "a" -> writeMVar mv Add >> loop
Just "r" -> writeMVar mv Reset >> loop
Just "p" -> writeMVar mv Print >> loop
Just "q" -> writeMVar mv Quit
Just s -> write ("Unknown command: " ++ s ++ "\n") >> loop
Nothing -> loop
```

We put the two together by creating an MVar and then running the interface concurrently with the interface.

```
example6 = do
mv <- newMVar
fork (simulation mv 0)
(interface mv input)
```

## What about multiple sources of inputs?

What if we wanted to have multiple interfaces to the same simulation? For example, what if we wanted to enable "Remote Desktop" and send commands via the network?

We could do that with an additional, network interface. This code sets up a socket to listen for commands sent via the code in Client.hs.

```
network :: PortNumber -> MVar Msg -> C IO ()
network port mv = do
handle <- lift (withSocketsDo $ do
socket <- listenOn (PortNumber port)
(handle, host, port) <- accept socket
(hSetBuffering handle NoBuffering)
return handle)
let network_input = do x <- hReady handle
if x then liftM Just (hGetLine handle)
else return Nothing
interface mv (lift network_input)
lift (hClose handle)
```

Then, we can run this code concurrently with the original simulation and interface.

```
example7 port = do
mv <- newMVar
fork (simulation mv 0)
fork (interface mv input)
network port mv
```

## How do you merge an infinite lists of infinite lists?

Why did we make C a monad transformer when we only used IO as the underlying monad? What other monads can we use "concurrently"?

Consider this a simple, academic question. How do you merge an infinite list of infinite lists?

For example, where are some infinite lists:

```
ones = '1' : ones
twos = '2' : twos
threes = '3' : threes
```

*Main> take 10 ones

*Main> take 20 twos

And here is an infinite list of infinite lists.

```
allnums :: [String]
allnums = map nums [ 1 .. ] where
nums i = show i ++ nums i
```

*Main> take 20 (map (take 20) allnums)

How could we merge `allnums`

to create a single list that contains all of the elements of all of the infinite lists. Spoiler: even though `concat`

typechecks, it won't work.

The trick is to "write" all of the strings concurrently, letter by letter.

First, we make the Writer monad an instance of the Output monad.

```
instance Output (Writer String) where
write = tell
```

Next, we'll create a computation that writes each string letter by letter, in parallel.

```
parallelWrite :: (Output m) => [String] -> C m ()
parallelWrite = foldr (par . writeLetterByLetter) stop
```

Finally, we can merge these infinite strings by running the parallelWrite computation.

```
merge :: [String] -> String
merge ss = snd (runWriter (run (parallelWrite ss)))
```

*Main> take 20 (merge allnums)

## How expensive is that list in the thread scheduler?

The thread scheduler above keeps track of the waiting threads in a list (and uses append everytime that a thread is moved to the end of the queue.) That means that each cycle of the thread scheduler takes time proportional to the number of threads in the queue!

How expensive is that? And are we forced to use a mutable data structure to do better?

First, let's do some profiling. We'll start with a loop that just increments a reference cell until it gets "really big".

```
bloop :: IORef Int -> C IO ()
bloop v = do b <- lift (do x <- readIORef v
writeIORef v (x + 1)
return $ x > 100000)
if b then return () else bloop v
```

Then, we will create a concurrent computation that forks n copies of this loop, as well as one last one that prints out the final version of the reference cell before exiting.

```
crazy :: Int -> C IO ()
crazy n = do v <- lift (newIORef (0 :: Int))
sequence_ $
foldr (\ _ -> (fork (bloop v) :))
[last v]
[1 .. n] where
last :: IORef Int -> C IO ()
last v = do bloop v
x <- lift (readIORef v)
lift (putStrLn (show x))
```

`-- main = run $ crazy 1000`

To see what is going on, we can compile this program using profiling:

% ghc -O2 --make concurrency.lhs -prof -auto-all -caf-all -fforce-recomp -rtsopts % ./concurrency +RTS -p

After running it, we look at the profiling output "concurrency.prof" to see that the majority of the time is spent in the operation of the function "sched".

# A purely functional queue

We can do better, while still using a purely functional datastructure, by representing a queue as a pair of lists.

`data Queue a = Queue [a] [a]`

The two lists together represent the queue. The first second list is the back of the queue, where we will put new elements. We can do this easily in constant time. Because this is a functional queue, we return the new queue as the result of the enqueue operation.

```
enqueue :: a -> Queue a -> Queue a
enqueue = undefined
```

The first list is the front of the queue. If there is an element available, then we can dequeue in constant time, and return the new queue with the elements removed.

```
dequeue :: Queue a -> Maybe (a, Queue a)
dequeue = undefined
```

If there are no elements remaining in the queue, we can fail.

However, if there are elements in the back of the queue, we can move them to the front and then try to dequeue again.

Why is this better? What is the running time for the dequeue operation. In the best case, it is constant time, but sometimes we need to reverse the list, and that could take time linear in the length of the queue. However, we know that a single element in the queue only needs to move from the back list to the front list once. Once it has been reversed, it will not need to be reversed again. So this queue implementation has "amortized" constant time for the dequeue operation.

Now let's rewrite the scheduler using the functional queue and reprofile.

```
qsched :: Monad m => Queue (Action m) -> m ()
qsched q = case dequeue q of
Nothing -> return ()
Just (a, as) -> case a of
Atom am -> do { a' <- am ; qsched (enqueue a' as) }
Fork a1 a2 -> qsched (enqueue a2 (enqueue a1 as))
Stop -> qsched as
```

```
single :: a -> Queue a
single x = (Queue [x] [])
```

```
qrun :: Monad m => C m a -> m ()
qrun m = qsched (single (action m))
```

```
main = do run (crazy 1000)
-- qrun (crazy 1000)
```

## Efficiency

- Intuition: a dequeue may require O(n) cons operations (to reverse the back list), but this cannot happen too often.

In more detail:

Note that each element can participate in one list reversal during its “lifetime” in the queue.

When an element is enqueued, we can “charge two tokens” for two cons operations. One of these is performed immediately; the other we put “in the bank.”

At every moment, the number of tokens in the bank is equal to the length of the back list.

When we find we need to reverse the back list to perform a dequeue, we will always have just enough tokens in the bank to pay for all of the cons operations involved.

So we can say that the amortized cost of each enqueue operation is two conses.

## Moral

We can implement a persistent queue data structure whose operations have the same (asymptotic, amortized) efficiency as the standard (ephemeral, double-pointer) imperative implementation.

## News :

Welcome to CIS 552!

See the home page for basic
information about the course, the schedule for the lecture notes
and assignments, the resources for links to the required software
and online references, and the syllabus for detailed information about
the course policies.