Due Friday, October 16th, 2009 at 11:00am.
Congratulations! You have gotten a job as a Junior Text Processing Specialist with Business Corp., Inc. (Hey, in this economy, you take what you can get.) Will you have what it takes to get that big promotion to Senior Text Processing Specialist? Let's find out...
First, download hw4.jar. Then create a new Java project in Eclipse and add hw4.jar to the classpath: right-click on the project and choose (Properties --> Java Build Path--> Libraries --> Add External Jar).
In your Java project, you should create a class
called TextProcessor; this is the file you will
submit for this assignment.
Of course, you will also want to test your assignment. Starting this week, the tests we provide will be broken up into a number of simpler classes containing normal JUnit tests. Here are the tester files you'll need this week:
Your first task is to decrypt some old meeting minutes which were encrypted for some reason; no one remembers why (or what they said).
A Caesar cipher is a simple type of cipher named for Julius Caesar, who used it to communicate secretly with his generals. At least, it was secret back then; today, his puny cipher will be no match for your programming skills.
To encrypt some text using a Caesar cipher, each letter is shifted by a certain amount. For example, if using a shift of 3, the letter A becomes D, B becomes E, C becomes F, ... and so on. Letters at the end of the alphabet "wrap around", so, for example, with a shift of 3, X becomes A, Y becomes B, and Z becomes C.
Decrypting a message encoded with a Caesar cipher is easy: just apply another shift so the total shift is 26. For example, to decrypt a message encrypted with a shift of 3, we would apply another shift of 23.
Your job for the first problem is to write a method
public static String caesar(String input, int shift)
which applies a Caesar cipher shift to a given input string. For
example, TextProcessor.caesar("The quick brown fox!", 7) should
return "Aol xbpjr iyvdu mve!". Note that capital letters
remain capital letters, lowercase remain lowercase, and non-letter
characters (such as spaces or punctuation) remain unchanged. Note: We will
only test your code with positive shift values.
For more details, see the javadocs.
char value has a
corresponding int value, which can be obtained by
casting. In fact, char values can simply be used as
if they were of type int, and Java will automatically
convert them. However, to convert in the other direction, one
must use an explicit (char) cast. This is
because int has a larger range
than char, so Java cannot know whether
an int to char conversion is safe; some
information may be lost.int values
corresponding to the characters 'A'
through 'Z' are all consecutive and in increasing
order, as are the values for 'a'
through 'z'.%,
gives the remainder when one number is divided by another.The CEO of Business Corp., Inc., Mr. I. M. Bizy, has a habit of using certain words (such as "synergistically", "leverage", and "actionable") far too often. In an attempt to synergistically leverage your core competencies to produce an actionable artifact alerting him to this fact, you have been tasked with analyzing some documents and labeling the occurrences of certain words with how many times those words have been used.
In particular, you should write the following method:
public static String numberOccurrences(String input, String word)
which consecutively numbers all the occurrences
of word in input. For example,
TextProcessor.numberOccurrences("How much wood would a wood chuck chuck if a
wood chuck could chuck wood?", "chuck")
should yield
"How much wood would a wood chuck1 chuck2 if a wood chuck3 could
chuck4 wood?"
You may assume that you only need to count exact matches;
for example, if the foregoing example contained "Chuck"
it would not be counted, since "Chuck"
and "chuck" are not identical. (However, see
the extra credit.)
You have now been tasked with preparing some company documents for public release. Some of the documents contain sensitive company secrets (such as the amount of coffee consumed by employees per day (hint: it is measured in hectaliters)), so your job is to remove secrets from the documents first. Actually, distinguishing between secrets and non-secrets is too hard, so you should just remove all the words.
Specifically, write a method
public static String redact(String input)
which replaces every word in the string input with a
single asterisk. For the purposes of this assignment, a "word"
consists of a consecutive sequence of letters or apostrophes;
for example, "don't" is one word, whereas "don,t" is two words
separated by a comma. (A real text processing application
would be slightly more precise; see the extra credit.) Spaces and
punctuation should be left unchanged.
For example, TextProcessor.redact("The quick, brown'd fox!") should
result in "* *, * *!".
char c = ''';is not valid Java code (why?). Instead, use the following:
char c = '\'';
Getting input from a String is kind of limiting: in
fact, there are lots of other places you could get input, such as from
a file, over the network, from a pigeon carrying a sheet of aluminum
stamped with Braille... wouldn't it be nice if your code worked in all
of these scenarios?
This final task has two parts.
public static String redactStream(Reader input) throws IOException
which performs the same function as redact, but gets its
input from a Reader instead of a String.
Note: in order to use Reader and the other classes
you'll need for this problem, you should add this to the top
of TextProcessor.java:
import java.io.*;
(Although if you forget, it's not that big of a deal: Eclipse will offer to do it for you!)
Reader?Strings, files, network interfaces, or
pigeons) can extend Reader. See
the Reader
javadocs.throws IOException stuff?Reader methods you'll be using can
throw an IOException if anything goes wrong (such as
if there is a network error, or the pigeon dies). So for now, you
just need to declare that your redactStream method
can throw one as well.redact to call redactStream
(hint:
see StringReader). Don't
duplicate functionality!public static String redactFile(String fileName) throws IOExceptionwhich takes as an argument the name of a file, and returns its redacted contents. (Hint: take a look at the FileReader class.) To test this code, you will need the tester files above, plus testDocument.txt, which must be in the same directory as your .java files.
Congratulations, you got that promotion to Senior Text Processing Specialist! But do you have what it takes to become a Senior Text Processing Consultant, with your very own desk (in the basement)?
Update your numberOccurrences method so that it also
counts inexact matches that differ only in case. For
example, numberOccurrences("The the THE ThE tHE springtime",
"THe") should result in "The1 the2 THE3 ThE4 tHE5
springtime".
Update your redact method so that it handles
apostrophes correctly. For the purposes of the assignment, you were
told to assume that an apostrophe always counts as part of a word;
but really, an apostrophe is only part of a word if it occurs in
between two letters. Otherwise, it is punctuation. For
example, redact("I don't 'think' so") should
yield "* * '*' *": the first apostrophe is part of a
word, since it occurs between n and t, but the second and third
apostrophes are punctuation, and are therefore copied unchanged to
the redacted output.