In question 2, when we say "flavor of BLAST" we mean one of blastn, blastp, blastx, tblastn, tblastx, psi-blast, or phi-blast. Choose one of these. Choose an assumption that is made when performing this algorithm. Give an example of a problem where this assumption is not valid and show how blast will give you misleading results. Repeat.
The assumption does not have to be specific to the flavor of BLAST you choose.
Question 4 is a somewhat badly worded question. Normally, you don't usually de novo design a positional weight matrix; rather, you attempt to learn one from data. This question is asking you to predict the sort of PWM you will get if the motif you end up learning has the characteristics listed in each part.
To make this more concrete, pretend you are learning the PWM from data that is distributed accordingly.
* In part A, your training set would contain equal numbers of -A-G and -A-T sequences, where '-' can be any base.
* In part B, your training set would contain equal numbers of CCGA and ---T sequences (half CCGA, half sequences that end with T)
* In part C, your training set would contain equal numbers of each of the four sequences.
So when we say your PWM should assign the highest possible probability to the motif, we really mean the PWM should do as well as it can on the strings of interest, not that there is one highest probability that the PWM can assign and that each of the strings of interest should be able to score that highly.
Don't worry about smoothing the counts, unless you want to (if you don't know what smoothing refers to, check out the supplementary handout on motif finding).
Also, don't worry about taking the log of the probabilities -- the sequences are too short to worry about that.
Sensitivity vs Specificity is always kind of confusing when you attempt to juggle around the equations. Basically, sensitivity measures how well your algorithm performs in classifying things that should be positive; specificity measures how well your algorithm performs in classifying things that should be negative.
The formulae are as follows:
Sens = TP / (TP + FN)
Spec = TN / (TN + FP)
Let's break these down. Sensitivity is concerned with things that should be classified as positive. These are the ones you classified correctly (TP = true positives) plus the ones that you classified as negative when you should have classified them as positive (FN = false negatives). Thus, sensitivity is the number you got right among the things that you should have classified as positive.
Specificity is concerned with things that should be classified as negative. These are the ones you classified correctly (TN = true negative) and the ones you classified incorrectly (FP = false positive, labelled as positive when they should have been labelled as negative). Specificity is the number you got right among the things you should have classified as negative.
Specificity can be a very difficult thing to calculate, given your problem formulation, since it's not always easy to enumerate your true negatives. A different measure, called Precision, is calculated using the following:
Precision = TP / (TP + FP)
It gets at sort of the same thing as Specificity, except it's actually the number correct out of everything that you classified to be positive. This can be an easier quantity to obtain.
For the homework, do report Sensitivity, but as for Specificity vs Precision, report whichever one is easier, but make it clear which one you're reporting -- actually list the equation you use. Unfortunately, not everyone uses the same equation for Specificity, so it's always best to make it explicit what you're reporting.