
Homework 2: Computing Human Evolution
To get started: Visit the CIS 120 course in Codio and select the “hw02: Computing Human Evolution” project. The only files you need to modify are dna.ml and dnaTest.ml. Once you are finished, be sure create a zipfile, download your zip to your local disk and then submit your homework.
The files for this homework are already available in the Codio project, but you can download a fresh copy if needed using this link: hw02.zip.
FAQ
We have created an FAQ answering the most common questions for this assignment. This can be found here.
Grading
- There are 95 points possible on the autograded part of this assignment, plus 5 points for your test cases that will be added during a later phase of manual grading. The functions for which we will specifically grade your tests are as follows:
- hamming_distance
- decreasing_similarity
- acids_of_helix
- helix_of_tree
- unlabel_tree
- parent_child_hamming
- Although we will grade only the tests for the problems defined above, you must be sure to always exhaustively test all functions in your program.
- You may submit without penalty up to THREE times.
- Each extra submission costs you 5 points.
Background
Biologists use evolutionary trees (Figure A below) to show species evolving from ancestor species. To practice using enumerations and recursion over lists and trees, we will write a program that automatically generates hypothesis trees like the one in Figure A.
Figure A: Evolutionary Tree for Apes

- Lar Gibbon (Hylobates lar)
- Pileated Gibbon (Hylobates pileatus)
- Siamang (Symphalangus syndactylus)
- White-Cheeked Gibbon (Nomascus siki)
- Orangutan (Pongo abelii)
- Gorilla (Gorilla gorilla)
- Chimpanzee (Pan troglodytes)
- Human (Homo sapiens)
Our program input will be real samples of DNA, the genetic code that describes how to build organisms (Figure B). DNA has two complementary helices, each a sequence of the nucleotides adenine, thymine, guanine, and cytosine. Adenine always appears opposite thymine; same for guanine and cytosine.
Figure B: Modeling the DNA Double Helix
Diagram | Model |
---|---|
![]() |
![]() |
The assignment file “dna.ml” contains the DNA sequences for 8 ape species. This is real data from the Entrez Nucleotide database. You will use this data to
- Generate all possible evolutionary trees with the ape DNA at the leaves. Internal nodes in these trees correspond to ancestor species.
- Estimate each tree’s complexity.
- Choose the simplest tree as a candidate.
- Compare your candidate with the tree in Figure A.
For example, for the four “present-day” DNA helices “GCAT”, “TCGT”, “TAGA”, and “GAGA”, one possible evolutionary tree is shown in Figure C below. This constitutes an evolutionary tree for the given helices, since these helices are at the leaves of the tree, while the internal nodes have as labels helices that are in some sense as close as possible to the labels of their children. Specific details on constructing evolutionary trees are provided in the homework file.
Figure C: Example of a Completely Labeled Tree

Supplementary Links
Browse these links for more background information:
We derive our images from Wikimedia sources available under Creative Commons licenses.