Haskell logo CIS 5520: Advanced Programming

Fall 2024

  • Home
  • Schedule
  • Homework
  • Resources
  • Software
  • Style guide

A Parser for Lu

For this problem, you will implement a parser for the Lu programming language.

> module LuParser where

Make sure that you read the LuSyntax module that describes the syntax of Lu before continuing. If you have questions, you can also consult the Lu manual.

This problem also uses definitions from the Parsers module from the lecture notes, gathered together in the module Parser.hs. Operations such as chainl1 and filter are imported as P.chainl1 and P.filter. You should also familiarize yourself with this module before continuing.

The goal of this part of the exercise is to give you practice with the operations in the Control.Applicative library. As a result the Parser type is not given a monad instance, so you will not be able use do notation with it. Furthermore, you may not edit the Parser module, and you do not have access to the constructor for the Parser type, so you will not be able to define your own monad instance either.

> import Control.Applicative
> import qualified Data.Char as Char
> import LuSyntax
> import Parser (Parser)
> import qualified Parser as P
> import Test.HUnit  (runTestTT,Test(..),Assertion, (~?=), (~:), assert, Counts)
> import qualified Test.QuickCheck as QC

Tesing your Parser

Your primary method of testing your parser should be using QuickCheck, though you will also want to define your own unit tests as you go.

In particular, the following "round tripping" properties should be satisfied by your implementation. These properties state that given an arbitrary Value/Expression/Statement, if we pretty print it

> prop_roundtrip_val :: Value -> Bool
> prop_roundtrip_val v = P.parse valueP (pretty v) == Right v
> prop_roundtrip_exp :: Expression -> Bool
> prop_roundtrip_exp e = P.parse expP (pretty e) == Right e
> prop_roundtrip_stat :: Statement -> Bool
> prop_roundtrip_stat s = P.parse statementP (pretty s) == Right s

More Parser combinators

As a warm-up, let's define a few helper functions that we can use later.

In general, so that our parsers are flexible about spaces that appear in source programs, all of the parsers will need to skip over any trailing white space.

First, define a parser combinator which takes a parser, runs it, then skips over any whitespace characters occurring afterwards. HINT: you'll need the space parser from the Parser library.

> wsP :: Parser a -> Parser a
> wsP p = undefined
> test_wsP :: Test
> test_wsP = TestList [
>   P.parse (wsP P.alpha) "a" ~?= Right 'a',
>   P.parse (many (wsP P.alpha)) "a b \n   \t c" ~?= Right "abc"
>   ]
> -- >>> runTestTT test_wsP

Use this to define a parser that accepts only a particular string s and consumes any white space that follows. The last test case ensures that trailing whitespace is being treated appropriately.

> stringP :: String -> Parser ()
> stringP = undefined
> test_stringP :: Test
> test_stringP = TestList [
>   P.parse (stringP "a") "a" ~?= Right (),
>   P.parse (stringP "a") "b" ~?= Left "No parses",
>   P.parse (many (stringP "a")) "a  a" ~?= Right [(),()]
>   ]
> -- >>> runTestTT test_stringP

Define a parser that will accept a particular string s, returning a given value x, and also and consume any white space that follows.

> constP :: String -> a -> Parser a
> constP _ _ = undefined
> test_constP :: Test
> test_constP = TestList [
>   P.parse (constP "&" 'a')  "&  " ~?=  Right 'a',
>   P.parse (many (constP "&" 'a'))  "&   &" ~?=  Right "aa"
>   ]
> -- >>> runTestTT test_constP

We will also use stringP for some useful operations that parse between delimiters, consuming additional whitespace.

> parens :: Parser a -> Parser a
> parens x = P.between (stringP "(") x (stringP ")")
> braces :: Parser a -> Parser a
> braces x = P.between (stringP "{") x (stringP "}")
> -- >>> P.parse (many (brackets (constP "1" 1))) "[1] [  1]   [1 ]"
> -- Right [1,1,1]
> brackets :: Parser a -> Parser a
> brackets x = undefined

Parsing Constants

Now let's write parsers for the Value type, except for table constants (which we won't parse).

> valueP :: Parser Value
> valueP = intValP <|> boolValP <|> nilValP <|> stringValP

To do so, fill in the implementation of the four parsers above. As above, these four parsers should consume any following whitespace. You can make sure that happens by testing 'many' uses of the parser in a row.

> -- >>> P.parse (many intValP) "1 2\n 3"
> -- Right [IntVal 1,IntVal 2,IntVal 3]
> intValP :: Parser Value
> intValP = undefined
> -- >>> P.parse (many boolValP) "true false\n true"
> -- Right [BoolVal True,BoolVal False,BoolVal True]
> boolValP :: Parser Value
> boolValP = undefined
> -- >>> P.parse (many nilValP) "nil nil\n nil"
> -- Right [NilVal,NilVal,NilVal]
> nilValP :: Parser Value
> nilValP = undefined

Lu literal strings are sequences of non-quote characters between double quotes. For simplicity, Lu does not allow escaped quote characters to appear in literal strings.

> stringValP :: Parser Value
> stringValP = undefined
> test_stringValP :: Test
> test_stringValP = TestList [
>   P.parse stringValP "\"a\"" ~?= Right (StringVal "a"),
>   P.parse stringValP "\"a\\\"\"" ~?= Right (StringVal "a\\"),
>   P.parse (many stringValP) "\"a\"   \"b\"" ~?= Right [StringVal "a",StringVal "b"],
>   P.parse (many stringValP) "\" a\"   \"b\"" ~?= Right [StringVal " a",StringVal "b"]
>   ]
> -- >>> runTestTT test_stringValP

At this point you should be able to quickcheck the prop_roundtrip_val property. You'll need to do this in the terminal.

QC.quickCheck prop_roundtrip_val

Parsing Expressions

Next, let's parse some Lu expressions.

We've already stratified the grammar for you, so that we'll get the appropriate precedence and associativity for the binary and unary operators. Make sure to read the end of the parsers lecture to understand how this code works.

However, this code won't work until you complete all the parts of this section.

> expP :: Parser Expression
> expP    = compP where
>   compP   = catP `P.chainl1` opAtLevel (level Gt)
>   catP    = sumP `P.chainl1` opAtLevel (level Concat)
>   sumP    = prodP `P.chainl1` opAtLevel (level Plus)
>   prodP   = uopexpP `P.chainl1` opAtLevel (level Times)
>   uopexpP = baseP
>       <|> Op1 <$> uopP <*> uopexpP 
>   baseP = tableConstP
>        <|> Var <$> varP
>        <|> parens expP
>        <|> Val <$> valueP
> 
> -- | Parse an operator at a specified precedence level
> opAtLevel :: Int -> Parser (Expression -> Expression -> Expression)
> opAtLevel l = flip Op2 <$> P.filter (\x -> level x == l) bopP

A variable is a prefix followed by some number of indexing terms or just a name. We've also done this one for you.

> -- >>>  P.parse (many varP) "x y z"
> -- Right [Name "x", Name "y", Name "z"]
> -- >>> P.parse varP "(x.y[1]).z"
> -- Right (Dot (Var (Proj (Var (Dot (Var (Name "x")) "y")) (Val (IntVal 1)))) "z")
> varP :: Parser Var
> varP = mkVar <$> prefixP <*> some indexP <|> Name <$> nameP
>   where
>     mkVar :: Expression -> [Expression -> Var] -> Var
>     mkVar e l = foldr1 (\f p u -> p (Var (f u))) l e 
>     prefixP :: Parser Expression
>     prefixP = parens expP <|> Var . Name <$> nameP
>     indexP :: Parser (Expression -> Var)
>     indexP =  flip Dot <$> (P.string "." *> nameP)
>          <|> flip Proj <$> brackets expP

Define an expression parser for names. Names can be any sequence of upper and lowercase letters, digits and underscores, not beginning with a digit and not being a reserved word. Your parser should also consume any trailing whitespace characters.

> reserved :: [String]
> reserved = [ "and", "break","do","else","elseif","end"
>  ,"false","for","function","goto","if","in"
>  ,"local","nil","not","or","repeat","return"
>  ,"then","true","until","while"]
> -- >>> P.parse (many nameP) "x sfds _ nil"
> -- Right ["x","sfds", "_"]
> nameP :: Parser Name
> nameP = undefined

Now write parsers for the unary and binary operators. Make sure you check out the manual or the LuSyntax module for the list of all possible operators. The tests are not exhaustive.

> -- >>> P.parse (many uopP) "- - #"
> -- Right [Neg,Neg,Len]
> uopP :: Parser Uop
> uopP = undefined
> -- >>> P.parse (many bopP) "+ >= .."
> -- Right [Plus,Ge,Concat]
> bopP :: Parser Bop
> bopP = undefined

Finally write a parser for table construction:

> -- >>> P.parse tableConstP "{ x = 2, [3] = false }"
> -- Right (TableConst [FieldName "x" (Val (IntVal 2)),FieldKey (Val (IntVal 3)) (Val (BoolVal False))])
> tableConstP :: Parser Expression
> tableConstP = undefined

At this point you should be able to quickcheck the prop_roundtrip_exp property.

Parsing Statements

Finally, define a parser for statements ...

> statementP :: Parser Statement
> statementP = undefined

... and one for blocks.

> blockP :: Parser Block
> blockP = undefined

At this point you should be able to quickcheck the prop_roundtrip_stat property.

Parsing Expressions and Files

Finally, we'll export these convenience functions for calling the parser.

> parseLuExp :: String -> Either P.ParseError Expression
> parseLuExp = P.parse expP 
> parseLuStat :: String -> Either P.ParseError Statement
> parseLuStat = P.parse statementP
> parseLuFile :: String -> IO (Either P.ParseError Block)
> parseLuFile = P.parseFromFile (const <$> blockP <*> P.eof) 

File-based tests

> tParseFiles :: Test
> tParseFiles = "parse files" ~: TestList [
>                 "fact"  ~: p "lu/fact.lu"  wFact,
>                 "test"  ~: p "lu/test.lu"  wTest,
>                 "abs"   ~: p "lu/abs.lu"   wAbs,
>                 "times" ~: p "lu/times.lu" wTimes,
>                 "table" ~: p "lu/table.lu" wTable,
>                 "bfs"   ~: p "lu/bfs.lu"   wBfs] where
>   p fn ast = do
>     result <- parseLuFile fn
>     case result of
>       (Left _) -> assert False
>       (Right ast') -> assert (ast == ast')

Unit Tests

These unit tests summarize the tests given above.

> test_comb = "parsing combinators" ~: TestList [
>  P.parse (wsP P.alpha) "a" ~?= Right 'a',
>  P.parse (many (wsP P.alpha)) "a b \n   \t c" ~?= Right "abc",
>  P.parse (stringP "a") "a" ~?= Right (),
>  P.parse (stringP "a") "b" ~?= Left "No parses",
>  P.parse (many (stringP "a")) "a  a" ~?= Right [(),()],
>  P.parse (constP "&" 'a')  "&  " ~?=  Right 'a',
>  P.parse (many (constP "&" 'a'))  "&   &" ~?=  Right "aa",
>  P.parse (many (brackets (constP "1" 1))) "[1] [  1]   [1 ]" ~?= Right [1,1,1]
>  ]
> test_value = "parsing values" ~: TestList [
>  P.parse (many intValP) "1 2\n 3" ~?= Right [IntVal 1,IntVal 2,IntVal 3],
>  P.parse (many boolValP) "true false\n true" ~?= Right [BoolVal True,BoolVal False,BoolVal True],
>  P.parse (many nilValP) "nil nil\n nil" ~?= Right [NilVal,NilVal,NilVal],
>  P.parse stringValP "\"a\"" ~?= Right (StringVal "a"),
>  P.parse stringValP "\"a\\\"\"" ~?= Right (StringVal "a\\"),
>  P.parse (many stringValP) "\"a\"   \"b\"" ~?= Right [StringVal "a",StringVal "b"],
>  P.parse (many stringValP) "\" a\"   \"b\"" ~?= Right [StringVal " a",StringVal "b"]
>  ]
> test_exp = "parsing expressions" ~: TestList [
>  P.parse (many varP) "x y z" ~?= Right [Name "x", Name "y", Name "z"],
>  P.parse varP "(x.y[1]).z" ~?=  Right (Dot (Var (Proj (Var (Dot (Var (Name "x")) "y")) (Val (IntVal 1)))) "z"),
>  P.parse (many nameP) "x sfds _ nil" ~?= Right ["x","sfds", "_"],
>  P.parse (many uopP) "- - #" ~?=  Right [Neg,Neg,Len],
>  P.parse (many bopP) "+ >= .." ~?= Right [Plus,Ge,Concat],
>  P.parse tableConstP "{ x = 2, [3] = false }" ~?=
>     Right (TableConst [FieldName "x" (Val (IntVal 2)),FieldKey (Val (IntVal 3)) (Val (BoolVal False))])
>  ]
> test_stat = "parsing statements" ~: TestList [
>  P.parse statementP ";" ~?= Right Empty,
>  P.parse statementP "x=3" ~?= Right (Assign (Name "x") (Val (IntVal 3))),
>  P.parse statementP "if x then y=nil else end" ~?=
>     Right (If (Var (Name "x")) (Block [Assign (Name "y") (Val NilVal)]) (Block [])),
>  P.parse statementP "while nil do end" ~?=
>     Right (While (Val NilVal) (Block [])),
>  P.parse statementP "repeat ; ; until false" ~?=
>     Right (Repeat (Block [Empty, Empty]) (Val (BoolVal False)))
>    ]
> -- >>> runTestTT test_stat

Testing summary

> test_all :: IO Counts
> test_all = runTestTT $ TestList [ test_comb, test_value, test_exp, test_stat, tParseFiles ]
> -- >>> test_all
> qc :: IO ()
> qc = do
>   putStrLn "roundtrip_val"
>   QC.quickCheck prop_roundtrip_val
>   putStrLn "roundtrip_exp"
>   QC.quickCheck prop_roundtrip_exp
>   putStrLn "roundtrip_stat"
>   QC.quickCheck prop_roundtrip_stat
Design adapted from Minimalistic Design | Powered by Pandoc and Hakyll