CIS 120 OCaml Testing Guide

Testing in OCaml

Table of Contents:

Why do we care about testing?

CIS 121 answers this well, so we share their answer here:

Why do we care so much about testing? After all, we write tests for homework assignments, and you aren’t likely to come back to a homework assignment in six months and add breaking features to it. However, we are in the business of teaching good habits and practices. There are several benefits of writing tests properly:

  • Unit tests help you determine code correctness. How else would you know if your assignment is ready to submit unless you had written a full unit test suite?

  • Unit tests provide a mental framework for code design. After writing out your test suite, you should have a better understanding of the flows of data within your program, as well as the outputs that your program should emit in response to certain inputs.

  • Unit tests are an easy way of letting other people jump into a codebase. While this does not apply to CIS 121, in real life, codebases are shared between people. Not everyone can have a complete understanding of a software project; even Linus Torvalds, inventor of Linux, doesn’t fully understand the entire operating system and its ecosystem at this point. It is ostensible that someone can add a new feature or make a change to the code base that accidentally breaks existing functionality. A full test suite would catch this. What if an engineer came to work on a project years after you left that project? She wouldn’t be able to call you and ask you for your knowledge of the codebase. Therefore, unit tests are therefore an artifact of your knowledge.

  • Unit testing saves your organization money. While this also does not apply to CIS 121, this is probably the most important point, and definitely the most overlooked point. People write software to accomplish some business goal, whether that is creating a product, consulting for a client, performing academic research, or automating a manual process. If you write code that breaks something, you cost your organization money. For example, your customers might not be able to buy an item, you could lose a consulting client and damage your firm’s reputation, you could waste grant money, or you could perform some task incorrectly. Unit and integration tests are incorporated into all modern build systems; put simply, you will not be physically able to deploy code to production servers if any automated tests fail. Therefore, you are hedged against a large cause of performance regressions, and you save your organization money.

Testing in OCaml

Test from the bottom up.

CIS 110, CIS 120, and CIS 121 all recommend bottom-up testing. This means you should write and debug simpler tests before more complex tests. In OCaml, most of your solutions will rely on recursion. Many of your homeworks will guide you through the process of writing simple, short functions and combining them into larger functions that solve big problems.

It is better to make sure the small, short functions work before you test the larger functions, since bugs in the small, short functions may propagate in unexpected ways all throughout your entire homework assignment. Similarly, it is better to make sure the smaller, simpler parts of your function (like the base case) work before you test the function on more complex inputs.

You should order “small” test cases (edge cases and base cases) before large test cases. This way, if your code has a bug, you can narrow down the location of the bug as close to its source as possible.

Start by reading the function or method description.

What should the function do at a high level? As a starting point, write the tests that easily come to mind to make sure the function behaves correctly.

Think about the inputs and outputs of the function.

Think about what unique arguments or outputs your function might deal with in the context of the problem. Write tests for each unique, relevant structure of input and output.

Example:

Here are the description and method header for the coins function you implement in HW01:

    (* Your job in this problem is to calculate the smallest number of pennies, nickels, and dimes that 
    can be used to add up to the given amount. For example, to make 7 cents, a total of 3 coins are needed 
    (two pennies and a nickel); to make 99 cents, 14 coins are needed (9 dimes, 1 nickel, and 4 pennies). *)

    let rec coins (amount: int) : int = 
        failwith "unimplemented"

What sort of integer inputs could coins get in the context of the problem?

  • Zero, meaning no coins needed
  • A multiple of ten, meaning only dimes needed
  • A multiple of five less than ten, meaning only nickels needed
  • A multiple of one less than five, meaning only pennies needed
  • A value meaning nickels and dimes needed
  • A value meaning nickels and pennies needed
  • … and so on.

In this example, testing all of the relevant inputs in the context of the problem also covers all of the relevant outputs - no coins, multiple coins, etc.

Think about your solution.

If you have already written your solution, read it over. Make sure your tests reach each line of code. This approach to testing is called code coverage: make sure your test suite will reach every possible line of executable code.

Keep in mind that sometimes a single line of code can cause different things to happen! For example, an if-statement on a single line could evaluate to two or more different values. Test for the boolean condition and its outcomes too.

Example:

Imagine a simple pattern-match statement…

begin match l with
  | [] -> []
  | hd :: tl -> if cond then A else if other_cond then B
                else C
end

Write three test cases to make sure cond and A, other_cond and B, and neither condition and C occur together. To make sure you cover each line, write a test case for the base case too!

Think about recursion.

Recursion will be one of the most common tools you work with in OCaml.

Here is a list of things it might be useful to test when dealing with a recursive function:

  • The base case(s)
  • Regular cases
    • Tree
      • Leaf or Empty
      • Node(lt, _, rt)
        • lt and rt could be Leafs, Empty, or Nodes themselves
      • Trees of depth > 2
    • List
      • []
      • [hd]
      • hd :: tl
      • If relevant, hd :: snd :: tl, etc.

It’s also good to ask yourself:

  • Will the recursion ever terminate early, before the base case?
    • For example, HW01’s rainfall function should terminate at -999 if it appears before you reach the end of the list.
    • Test each of the early termination conditions.
  • Does my solution involve if-statements or boolean operators?
    • When should the if-statement go to the then or the else branch?
    • Should the function answer a question for all (&&) or at least one (||)?
    • Test that it does properly.
  • What will be the most complex parts of my solution?
    • Write tests as if you (or a hacker) were deliberately trying to break your function with tricky arguments.

Example:

Here are the description and method header we give you for the rainfall function we ask you to implement in HW01:

(* Design and implement a function called `rainfall` that consumes a list of ints representing daily 
rainfall readings. The list may contain the number -999 indicating the end of the data of interest.

Produce the average of the non-negative values in the list up to the first -999 (if it shows up). 
There may be negative numbers other than -999 in the list, representing faulty readings; 
these should be skipped.  If you cannot compute an average, for whatever reason, return -1.  *)

The recursion should terminate early at -999, so we should write a test case to check that. The -999 could appear at the start, middle, or end of the list, so we should write tests to check those cases. The function should ignore negative numbers, so we should write a test case to check that. If we can’t compute an average, we’ll have to return -1. That sounds like it’ll involve an if-statement, so we should write a test case to check that.

let test () : bool =
    rainfall [1; -999; 3; 4; 5; 6] = 1
;; run_test "rainfall: -999 in the middle of the list" test

let test () : bool =
    rainfall [-999; 1; 5; 6] = -1
;; run_test "rainfall: -999 starts the list" test

let test () : bool =
    rainfall [1; 2; 3; -999] = 2
;; run_test "rainfall: -999 ends the list" test 

let test () : bool =
    rainfall [6; 2; -6] = 4
;; run_test "rainfall: ignores negative numbers" test

let test () : bool =
    rainfall [] = -1
;; run_test "rainfall: empty list produces -1" test

let test () : bool =
    rainfall [-1; -2; -3; -4] = -1
;; run_test "rainfall: list of negatives produces -1" test

Think about the data structures the function works with.

We introduce the concept of an “interface” and “invariants” ahead of hw03, so don’t worry if you’re a bit unfamiliar with the jargon in this section. Still give this a read-through - this approach is useful for testing in hw01 and hw02.

Many of the homeworks in CIS 120 deal with storing information in data structures. For these homeworks, testing individual functions directly can be challenging. Instead, we can leverage “invariants” and use an approach called “property-based testing.”

When we are working with invariants, we need to test that functions preserve the properties (or invariants) of the data structure we are working with.

Our test suite should check whether the data structure, or output, produced by a function is valid - that it meets all the invariants, or has all of the desired properties, of that data structure. This is necessary to test, but it’s not enough to ensure correctness. Consider a situation where you ask your roommate if they want to Venmo you for half the cost of this month’s utilities, and they say no. This is a valid answer, but it is not the correct answer. To complement checks for validity, your tests should check for correctness as well.

How do we go about writing tests for data structures that test validity and correctness? In OCaml, we will create and modify data structures by calling functions, so we’ll test post-conditions: a statement that should be true after calling a function.

For example, let’s say we are working with a sorted list data structure. We can add elements to the sorted list using the function add. One post-condition is: after calling add with a new element, the resulting sorted list should be sorted (checking validity) and it should contain the new element (checking correctness).

So, we should approach test that our functions preserve validity and behave correctly by following these steps:

1. Translate plain-language requirements for the data structure into post-condition statements about the interface.

Example: Imagine we have a sorted list interface with the following values:

empty     : 'a sorted_list
add       : 'a -> 'a sorted_list -> 'a sorted_list
equals    : 'a sorted_list -> 'a sorted_list -> bool
is_sorted : 'a sorted_list -> bool
contains  : 'a -> 'a sorted_list -> bool

A plain-language requirement for a sorted list is that the list must be sorted. We can translate this into several post-condition statements about the interface.

  • After adding the same elements in different orders to two equal sorted lists, the two resulting lists must remain equal.
  • After adding elements to a sorted list, the resulting list must be sorted.
  • After adding a new element to a sorted list, the resulting list must contain the new element.

These post-conditions can be directly tested because they’re in terms of the interface, which lets us test the plain-language requirement of the data structure.

let test () : bool =
    equals (add 2 (add 3 empty)) (add 3 (add 2 empty))
;; run_test "sorted_list: diff. add orders still equal" test

let test () : bool =
    is_sorted (add 2 (add 3 (add 1 empty)))
;; run_test "sorted_list: add unsorted elts is sorted" test

let test () : bool =
    let current_set = add 2 (add 3 (add 1 empty))
    contains 1 current_set && contains 2 current_set 
    && contains 3 current_set
;; run_test "sorted_list: list contains new added elt" test

2. Write tests for the “interesting” instances of the statement about the interface.

Consider our statement about the sorted list interface: regardless of the order in which we add elements to the list, the sorted list must be sorted.

We tested this above with equals (add x (add y sorted_lst)) (add y (add x sorted_lst)) where x and y were 2 and 3 and sorted_lst was empty.

What are some more interesting instances of this statement?

  • What if sorted_lst is an empty list? A nonempty list?
  • What if x = y? x > y? x < y?
  • Keep in mind that what is an “interesting instance” will depend on the invariants of your data structure.

Testing interface functions with other interface functions:

Finally, we’ll address a common question that often arises when testing a data structure through its interface. How do I write tests for an interface, when I can only test its functions using other functions in the interface? How can I avoid this “circular” testing?

The answer to this question is: write test cases that check each function in multiple different ways. Consider the test cases for add that we wrote up above: “sorted_list: diff. add orders still equal”, “sorted_list: add unsorted elts is sorted”, and “sorted_list: list contains new added elt”. We used contains, is_sorted, and equals to test add.

If we just used “sorted_list: diff. add orders still equal”:

let test () : bool =
    equals (add 2 (add 3 empty)) (add 3 (add 2 empty))
;; run_test "sorted_list: diff. add orders still equal" test

it’s possible that add might work incorrectly but still create two equal sets. In that case, our test that uses contains:

let test () : bool =
    let current_set = add 2 (add 3 (add 1 empty))
    contains 1 current_set && contains 2 current_set 
    && contains 3 current_set
;; run_test "sorted_list: list contains new added elt" test

would catch that error for us.

By checking the correctness of each function in multiple ways (with multiple different functions), we use “circular” testing to our advantage to thoroughly check the correctness of each function.

More Resources

If you’re interested in learning more about property-based testing, you can read this paper by John Hughes, the person who wrote QuickCheck - a famous Haskell property-based testing library, giving advice on how to write property based tests. We paraphrase from sections 4.1 and 4.2 of the paper here.