HW 1 - List Processing, Recursion and working with Higher-order Functions

This is the first homework assignment for CIS 5520. The first two problems provide practice with the basic built-in data structures of Haskell, including lists, tuples and maybes, as well as recursion and pattern matching. The second two problems provide practice with higher-order functions in Haskell. The last problem is a design exercise that puts everything together.

If you have not read the Basics module, and completed the associated quiz, you should do that first. The homework assignment also draws from the HigherOrder module.

This page is a "literate" Haskell program, meaning that explanation is interspersed with actual Haskell code. To complete your assignment, access your private hw01 repo, edit HW01.hs and submit it through Gradescope.

You can load the project into ghci with the command stack ghci. That will give you interactive access to all definitions in this module as well as the main function in the Main module that runs the test cases. Alternatively, you can run the test cases with the command line stack run. (For each of these commands, make sure that you are in the hw01 subdirectory in the terminal.)

> module HW01 where
> import Prelude hiding (reverse, concat, zip, (++), takeWhile, all)
> import Test.HUnit
>    ( (@?=), (~:), (~=?), (~?=), runTestTT, Assertion, Test(TestList), assertFailure )
> -- libraries for Kata problem (only)
> import Text.Read (readMaybe)
> import qualified Data.List as List
> import qualified Data.Char as Char
> import qualified Data.Maybe as Maybe

This file starts by first declaring that we are creating a module called HW01 and are using functions defined in the modules Prelude, Test.HUnit, Data.List Data.Char and Text.Read

The Prelude line imports all except for the functions listed (which you will write). The module Prelude is special in that it is always imported by default, so the the point of this line is not to import more functions, but rather to exclude a few functions. (Haskell does not allow functions to be redefined in the same module.)

The Test.HUnit line imports the specified functions and definitions from that module. Recall that operators (i.e. binary functions with symbolic names) must be enclosed in parentheses when they are named. Similarly, the import from Text.Read only brings the readMaybe function into scope. The last three lines import all functions from the respective modules, but makes them available with qualified names, such as List.intersperse, etc. We import them in this way so that you have the freedom to use any functions from these modules in your solution to the Kata problem.

> --------------------------------------------------------------------------------
> -- Problem (Good Style)
> --------------------------------------------------------------------------------

This is the test case for this problem. Recall that in Haskell, definitions can be in any order. This test has four subparts, defined below.

> testStyle :: Test
> testStyle = "testStyle" ~:
>    TestList [ tabc , tarithmetic, treverse, tzip ]

All of the following Haskell code does what it is supposed to do (i.e. the tests pass), but it is difficult to read. Rewrite the following expressions so that they exactly follow the style guide. Be careful: the style guide includes quite a few rules, and we've broken most of them in what follows! (You don't need to rewrite the test following each part, but you do need to make sure that you don't break the code as you refactor it!)

NOTE: Do not change the name of any of the top level declarations below, even if you think that they aren't very good (they aren't). We will be using automatic testing to ensure that you do not break anything when you rewrite these functions. On the other hand, local variables (such as function parameters and those bound by let and where) can and should be renamed.

NOTE: If you have set up VSCode and hlint correctly, your IDE should give you a few hints on how to improve these functions. But, it won't tell you everything.

> abc x y z =
>   if x then if y then True else
>        if (x && z) then True else False
>   else False

> tabc :: Test
> tabc = "abc" ~: TestList [abc True False True  ~?= True,
>                           abc True False False ~?= False,
>                           abc False True True  ~?= False]

> arithmetic :: ((Int, Int), Int) -> ((Int,Int), Int) -> (Int, Int, Int)
> arithmetic x1 x2 =
>      let a = fst (fst x1) in
>      let b = snd (fst x1) in
>      let c = snd x1 in
>      let d = fst (fst x2) in
>      let e = snd (fst x2) in
>      let f = snd x2
>        in
>        ((((((b*f) - (c*e)), ((c*
>        d) - (a*f)
>        ), ((a*e)-(b*d))))))

> tarithmetic :: Test
> tarithmetic = "arithmetic" ~:
>    TestList[ arithmetic ((1,2),3) ((4,5),6) ~?= (-3,6,-3),
>              arithmetic ((3,2),1) ((4,5),6) ~?= (7,-14,7) ]

> reverse l  = reverseAux l [] where
>   reverseAux l acc =
>     if null l then acc
>        else reverseAux (tail l) (head l : acc)

> treverse :: Test
> treverse = "reverse" ~: TestList
>     [reverse [3,2,1] ~?= ([1,2,3] :: [Int]),
>      reverse [1]     ~?= ([1]     :: [Int]) ]

> zip xs ys = g 0 xs ys where
>   g n xs ys = if n == length xs || n == length ys then [] else
>           (xs !! n, ys !! n) : g (n + 1) xs ys

> tzip :: Test
> tzip = "zip" ~:
>   TestList [ zip "abc" [True,False,True] ~?= [('a',True),('b',False), ('c', True)],
>              zip "abc" [True] ~?= [('a', True)],
>              zip [] [] ~?= ([] :: [(Int,Int)]) ]

> --------------------------------------------------------------------------------
> -- Problem (List recursion)
> --------------------------------------------------------------------------------

Now define, debug and test the following functions that work with lists. Some of these functions are part of the Haskell standard Prelude or standard libraries like Data.List. Their solutions are readily available online. You should not google for this code: instead, implement them yourself!

For each part of this problem, you should replace the testcase for that part based on the description in the comments. Make sure to test with multiple inputs using TestList. We will be grading your test cases as well as the correctness and style of your solutions! HINT: your testing code should include any tests that we give you in the the comments!

Do not use any library functions in this problem. These include all functions from the Prelude or from Data.List that take arguments or returns a result with a list type. However, (:) and [] are data constructors for the list type, not functions, so you are free to use them. Please also avoid list comprehension syntax, as it actually de-sugars into list library functions!

> testLists :: Test
> testLists = "testLists" ~: TestList
>   [tminimumMaybe, tstartsWith, tendsWith, ttranspose, tcountSub]

> -- | The 'minimumMaybe` function computes the mininum value of a
> -- nonempty list. If the list is empty, it returns Nothing.
> --
> -- >>> minumumMaybe []
> -- Nothing
> -- >>> minumumMaybe [2,1,3]
> -- Just 1 
> minimumMaybe :: [Int] -> Maybe Int
> minimumMaybe = undefined

> tminimumMaybe :: Test
> tminimumMaybe =
>    "minimumMaybe" ~: (assertFailure "testcases for minimumMaybe" :: Assertion)

> -- | The 'startsWith' function takes two strings and returns 'True'
> -- iff the first is a prefix of the second.
> --
> -- >>> "Hello" `startsWith` "Hello World!"
> -- True
> --
> -- >>> "Hello" `startsWith` "Wello Horld!"
> -- False
> startsWith :: String -> String -> Bool
> startsWith = undefined

> tstartsWith :: Test
> tstartsWith = "startsWith" ~: (assertFailure "testcase for startsWith" :: Assertion)

> -- | The 'endsWith' function takes two lists and returns 'True' iff
> -- the first list is a suffix of the second. The second list must be
> -- finite.
> --
> -- >>> "ld!" `endsWith` "Hello World!"
> -- True
> --
> -- >>> "World" `endsWith` "Hello World!"
> -- False

> endsWith :: String -> String -> Bool
> endsWith = undefined

> tendsWith :: Test
> tendsWith = "endsWith" ~: (assertFailure "testcase for endsWith" :: Assertion)

> -- | The 'transpose' function transposes the rows and columns of its argument.
> -- If the inner lists are not all the same length, then the extra elements
> -- are ignored. Note, this is *not* the same behavior as the library version
> -- of transpose (i.e. the version of transpose from Data.List).
> --
> -- >>> transpose [[1,2,3],[4,5,6]]
> -- [[1,4],[2,5],[3,6]]
> -- >>> transpose [] 
> -- []
> -- >>> transpose [[]] 
> -- []
> -- >>> transpose [[3,4,5]]
> -- [[3],[4],[5]]
> -- >>> transpose [[1,2],[3,4,5]]
> -- [[1,3],[2,4]]
> -- (WARNING: this one is tricky!)
> transpose :: [[a]] -> [[a]]
> transpose = undefined

> ttranspose :: Test
> ttranspose = "transpose" ~: (assertFailure "testcase for transpose" :: Assertion)

> -- | The 'countSub' function returns the number of (potentially overlapping)
> -- occurrences of a substring sub found in a string.
> --
> -- >>> countSub "aa" "aaa"
> -- 2
> -- >>> countSub "" "aaac"
> -- 5

> countSub :: String -> String -> Int
> countSub = undefined
> tcountSub :: Test
> tcountSub = "countSub" ~: (assertFailure "testcase for countSub" :: Assertion)

> --------------------------------------------------------------------------------
> -- Problem (Defining higher-order functions)
> --------------------------------------------------------------------------------

Define, debug and test the following operations that take higher-order functions as arguments. (For extra practice, you may define these operations using foldr, but that is not required.) Other than foldr, you may not use any list library functions for this problem.

> testHO :: Test
> testHO = TestList [ttakeWhile, tfind, tall, tmap2, tmapMaybe]

> -- | `takeWhile`, applied to a predicate `p` and a list `xs`,
> -- returns the longest prefix (possibly empty) of `xs` of elements
> -- that satisfy `p`.
> --
> -- >>> takeWhile (< 3) [1,2,3,4,1,2,3,4]
> -- [1,2]
> -- >>> takeWhile (< 9) [1,2,3]
> -- [1,2,3]
> -- >>> takeWhile (< 0) [1,2,3]
> -- []

> takeWhile :: (a -> Bool) -> [a] -> [a]
> takeWhile = undefined
> ttakeWhile :: Test
> ttakeWhile = "takeWhile" ~: (assertFailure "testcase for takeWhile" :: Assertion)

> -- | `find pred lst` returns the first element of the list that
> -- satisfies the predicate. Because no element may do so, the
> -- answer is returned in a `Maybe`.
> --
> -- >>> find odd [0,2,3,4]
> -- Just 3

> find :: (a -> Bool) -> [a] -> Maybe a
> find = undefined
> tfind :: Test
> tfind = "find" ~: (assertFailure "testcase for find" :: Assertion)

> -- | `all pred lst` returns `False` if any element of `lst`
> -- fails to satisfy `pred` and `True` otherwise.
> --
> -- >>> all odd [1,2,3]
> -- False

> all  :: (a -> Bool) -> [a] -> Bool
> all = undefined
> tall :: Test
> tall = "all" ~: (assertFailure "testcase for all" :: Assertion)

> -- | `map2 f xs ys` returns the list obtained by applying `f` to
> -- to each pair of corresponding elements of `xs` and `ys`. If
> -- one list is longer than the other, then the extra elements
> -- are ignored.
> -- i.e.
> --   map2 f [x1, x2, ..., xn] [y1, y2, ..., yn, yn+1]
> --        returns [f x1 y1, f x2 y2, ..., f xn yn]
> --
> -- >>> map2 (+) [1,2] [3,4]
> -- [4,6]
> --
> -- NOTE: `map2` is called `zipWith` in the Prelude

> map2 :: (a -> b -> c) -> [a] -> [b] -> [c]
> map2 = undefined

> tmap2 :: Test
> tmap2 = "map2" ~: (assertFailure "testcase for map2" :: Assertion)

> -- | Apply a partial function to all the elements of the list,
> -- keeping only valid outputs.
> --
> -- >>> mapMaybe root [0.0, -1.0, 4.0]
> -- [0.0,2.0]
> --
> -- (where `root` is defined below.)
> mapMaybe :: (a -> Maybe b) -> [a] -> [b]
> mapMaybe = undefined

> tmapMaybe :: Test
> tmapMaybe = "mapMaybe" ~: (assertFailure "testcase for mapMaybe" :: Assertion)

> root :: Double -> Maybe Double
> root d = if d < 0.0 then Nothing else Just $ sqrt d

> --------------------------------------------------------------------------------
> -- Problem (map and foldr practice for lists)
> --------------------------------------------------------------------------------

For the next group of functions, you are not allowed to use explicit recursion in your solutions. Instead, you must define them using one of the higher-order functions map, foldr or para (see below). These are the only list library functions that you may use on this problem. If you need any additional helper functions you may define them, but any helper functions should also use map, foldr or para instead of explicit recursion.

> testFoldr :: Test
> testFoldr = TestList [ tconcat,  tstartsWithHO, tendsWithHO, ttails, tcountSubHO]

> -- | The concatenation of all of the elements of a list of lists
> --
> -- >>> concat [[1,2,3],[4,5,6],[7,8,9]]
> -- [1,2,3,4,5,6,7,8,9]
> --

NOTE: remember you cannot use any list functions from the Prelude or Data.List for this problem, even for use as a helper function. Instead, define it yourself.

> concat :: [[a]] -> [a]
> concat = undefined

> tconcat :: Test
> tconcat = "concat" ~: (assertFailure "testcase for concat" :: Assertion)

> -- | The 'startsWithHO' function takes two strings and returns 'True'
> -- iff the first is a prefix of the second. This is the same as `startsWith` above
> -- except this time you need to use `foldr` to define it.
> --
> -- >>> "Hello" `startsWithHO` "Hello World!"
> -- True
> --
> -- >>> "Hello" `startsWithHO` "Wello Horld!"
> -- False

NOTE: use foldr for this one, but it is tricky! (Hint: the value returned by foldr can itself be a function.)

> startsWithHO :: String -> String -> Bool
> startsWithHO = undefined
> tstartsWithHO = "tstartsWithHO" ~: (assertFailure "testcase for startsWith" :: Assertion)

> -- INTERLUDE: para

Now consider a variant of foldr called para. In the case of cons, foldr provides access to the head of the list and the result of the fold over the tail of the list. The para function should do the same, but should also provide access to the tail of the list (before it has been processed).

> -- | foldr variant that provides access to each tail of the list
> para :: (a -> [a] -> b -> b) -> b -> [a] -> b
> para _ b [] = b
> para f b (x:xs) = f x xs (para f b xs)

For example, consider the tails function.

> -- | The 'tails' function calculates all suffixes of a give list and returns them
> -- in decreasing order of length. For example:
> --
> -- >>> tails "abc"
> -- ["abc", "bc", "c", ""],
> --
> tails :: [a] -> [[a]]
> tails []     = [[]]
> tails (x:xs) = (x:xs) : tails xs

It is a natural fit to implement tails using para. See if you can redefine the function above so that the test cases still pass.

> tails' = undefined

> ttails :: Test
> ttails = "tails" ~: TestList [
>     "tails0" ~: tails' "abc" ~?= ["abc", "bc", "c", ""],
>     "tails1" ~: tails' ""    ~?= [""],
>     "tails2" ~: tails' "a"   ~?= ["a",""] ]

> -- | The 'endsWith' function takes two lists and returns 'True' iff
> -- the first list is a suffix of the second. The second list must be
> -- finite.
> --
> -- >>> "ld!" `endsWith` "Hello World!"
> -- True
> --
> -- >>> "World" `endsWith` "Hello World!"
> -- False

NOTE: use para for this one!

> endsWithHO :: String -> String -> Bool
> endsWithHO = undefined

> tendsWithHO :: Test
> tendsWithHO = "endsWithHO" ~: (assertFailure "testcase for endsWithHO" :: Assertion)

> -- | The 'countSubHO' function returns the number of (potentially overlapping)
> -- occurrences of a substring sub found in a string.
> --
> -- >>> countSubHO "aa" "aaa"
> -- 2
> -- >>> countSubHO "" "aaac"
> -- 5

(You may use the para and startsWithHO functions in countSubHO.)

> countSubHO  :: String -> String -> Int
> countSubHO = undefined
> tcountSubHO = "countSubHO" ~: (assertFailure "testcase for countSubHO" :: Assertion)

> --------------------------------------------------------------------------------
> -- Data Munging Kata
> --------------------------------------------------------------------------------

A Code Kata is an exercise that helps an experienced programmer hone their skills. The coding exercise is usually not difficult---what is important is the analysis and design of the problem as well and the practice and repetition that lead to good coding habits. This exercise comes from website devoted to Code Katas and is not specific to Haskell.

Unlike the exercises above, for this problem you are allowed to use functions from Haskell's standard libraries. In particular, you may use list functions from the Prelude, or from Data.List in your solution. You may also use functions from Data.Char and Data.Maybe.

This problem is an exercise in three parts to do with real world data. For that reason, we aren't expecting you to produce a robust solution. You can expect input that is in a similar format to the data files that we provide (same number and ordering of columns, same header and footer layout). However, your code should be able to deal with reasonable changes (different number of days in a month, different number of teams in the league).

However, remember that you shouldn't use partial functions. There shouldn't be an input that causes your program to error. Definitely avoid functions such as (!!), read, or minimum.

This problem also is about refactoring, so try hard not to read ahead---do each of the three parts below in turn and then reflect on your experience.

> -- Part One: Weather

In jul22.dat (in the dat subdirectory) you'll find daily weather data for Philadelphia, PA for July 2022. This data is taken from NOAA.

Your job is to write a program to output the day number (column one) with the smallest temperature spread (the maximum temperature is the second column, the minimum the third column). If there are multiple days with the same smallest spread, your program should return the first one.

> -- >>> weatherProgram "dat/jul22.dat"
> -- "26"

We've given you the I/O parts of the program---opening the file and then printing the final result. You need to write the weather function below, that takes the string containing the text of the file and processes it to find the answer. Your program should work for any text file with the same format as this one. If the format is different, and your program cannot parse the data, then it should return Nothing. (We will discuss better approaches to error handling later in the semester.)

> weather :: String -> Maybe String
> weather str = error "unimplemented"

> weatherProgram :: String -> IO String
> weatherProgram file = do
>   str <- readFile file
>   return (case weather str of
>     Just result -> result
>     Nothing     -> "Cannot read file")

Hints: You should use the words and lines functions from the Haskell Prelude to split up the lines and columns in a manner that is robust to whitespace characters. You should also use the (overloaded) Read.readMaybe function to help you convert strings into integers. We've given it a new name and type signature to make it easier for you to use.

> -- | Use this function to parse Ints
> readInt :: String -> Maybe Int
> readInt = readMaybe

Here is the test case for this part. If this test fails because it cannot find the input file, you need to use the :cd command in ghci to make sure that you are in the right directory.

> testWeather :: Test
> testWeather = TestList [
>   "jul22" ~: do str <- readFile "dat/jul22.dat"
>                 weather str @?= Just "26", 
>   "jul21" ~: do str <- readFile "dat/jul21.dat"
>                 weather str @?= Just "18", 
>   "jul20" ~: do str <- readFile "dat/jul20.dat"
>                 weather str @?= Just "10", 
>   "jul19" ~: do str <- readFile "dat/jul19.dat"
>                 weather str @?= Just "8"
>   ]

> -- >>> runTestTT testWeather
> -- Counts {cases = 4, tried = 4, errors = 0, failures = 0}

> -- Part Two: Soccer League Table

The file dat/soccer21.dat contains the results from the English Premier League for the 2021/2022 season. This data is taken from SkySports.The columns labeled "W" and "L" contain the total number of wins and losses for each team in that season (so Liverpool won 28 games against opponents and lost twice). Write a program to find the name of the team with the smallest (absolute) difference in "W" and "L". If there are multiple teams with the smallest difference, your program should return the first one.

> soccer :: String -> Maybe String
> soccer = error "unimplemented"

> soccerProgram :: String -> IO String
> soccerProgram file = do
>   str <- readFile file
>   return $ case soccer str of
>     Just result -> result
>     Nothing     -> "Cannot read file"

Your program should work with all similar input files (same columns, same info in footer).

> testSoccer :: Test
> testSoccer = TestList [ 
>   "soccer21" ~: do
>       str <- readFile "dat/soccer21.dat"
>       soccer str @?= Just "Leicester City",
>   "soccer20" ~: do
>       str <- readFile "dat/soccer20.dat"
>       soccer str @?= Just "Aston Villa",
>   "soccer19" ~: do
>       str <- readFile "dat/soccer19.dat"
>       soccer str @?= Just "Burnley",
>   "soccer18" ~: do
>       str <- readFile "dat/soccer18.dat"
>       soccer str @?= Just "Everton"
>   ]

> -- >>> runTestTT testSoccer
> -- Counts {cases = 4, tried = 4, errors = 0, failures = 0}

> -- Part Three: DRY Fusion

Now, take the two programs written previously and factor out as much common code as possible, leaving you with two smaller programs and some kind of shared functionality.

> weather2 :: String -> Maybe String
> weather2 = undefined

> soccer2 :: String -> Maybe String
> soccer2 = undefined

> -- Kata Questions

Fill in the strings below with your answers.

> -- To what extent did the design decisions you made when writing the original
> -- programs make it easier or harder to factor out common code?

> shortAnswer1 :: String
> shortAnswer1 = "Fill in your answer here"

> -- Was the way you wrote the second program influenced by writing the first?

> shortAnswer2 :: String
> shortAnswer2 = "Fill in your answer here"

> -- Is factoring out as much common code as possible always a good thing? Did the
> -- readability of the programs suffer because of this requirement? How about the
> -- maintainability?

> shortAnswer3 :: String
> shortAnswer3 = "Fill in your answer here"

CIS 5520: Advanced Programming

Fall 2022

HW 1 - List Processing, Recursion and working with Higher-order Functions