CS3134 Homework #4
Due on November 11, 2004 at 11:00am
There are two parts to this homework: a written component worth
10 points, and a programming assignment worth 15 points. See
the homework submission
instructions on how to hand it in and for important notes on
programming style and structure.
Written questions
- (7 points) In class, we discussed a variety of very fast sorts (mergesort,
radix sort, and soon quicksort). There's another interesting sort
that's actually faster than all three of these, and it's called
bucket sort. The general idea is that you establish a
domain for the elements to be sorted; for example, if you are
sorting an array of numbers which may range between 0 to 999, you create
1,000 buckets, arrange the elements into those buckets, and then
pull them out of the buckets back into the array.
- (3 points) Describe an algorithm that we can use, given this basic
idea, to sort the contents of an array of integers ranging from 1 to
999.
- (2 points) Analyze the worst-case running time of your bucket sort
algorithm. It should, for full credit, be faster than O(n
lg n). If it's not, beat on the algorithm a little more.
- (2 points) If it's so fast, why don't we use bucket sort all the
time? Well, there are several major downsides. State at
least one disadvantage of bucket sort as compared to the other sorts
we've seen.
- (3 points; paraphrased from book programming 6.1) Suppose you had a
language without a multiply operator.
- (2 points) Write a method called mult
that takes a pair of ints, repeatedly
calls itself with smaller operands, and uses addition to combine the
results. In other words, the method should have no loop
constructs and should not use the multiply (*) operator anywhere.
Write it in Java; your syntax doesn't need to be perfect, but make it as
close as you can.
- (1 point) Analyze the worst-case running time of this algorithm.
You can assume that addition is a constant-time (e.g., O(1)) operation.
State the result in terms of O(expression), and make sure to
explain what how variable(s) in expression correspond to the
original multiplication operands.
Programming problems
In this assignment, we're going to write a program that implements the
beginnings of a
dictionary abstraction, i.e., you will build data structures and tools to
handle large numbers of words (Strings). You will use
this file as input for the program;
it's a scrambled version of a "dictionary" of words (without
definitions) as distributed with a particular Linux distribution. (To download it
in IE/Mozilla/Netscape, right-click and choose Save Target As or
Save Link To Disk.) The file only contains words with alphabetic
characters, both upper- and lower-case, one per line. You are
to process the file, ignoring but preserving case, and
support the operations as described below. You will use an
array-backed list to support this "dictionary".
- (13 points) Build an ArrayBackedList class
that uses an array to store the words. The constructor for
the class must take one parameter: the number of words to be
stored, which serves as a capacity property for the array.
You must then implement the following methods in your
ArrayBackedList:
- (1 point) public boolean insert(String
s): this takes a String and inserts it at the
bottom of the (occupied part of the) array. It also
updates an object-level variable called
longestWord on each insert, so that by the end
of input, longestWord contains the length of
the longest word. You should return true
unless the array is full, in which case return
false.
- (1 point) public String elementAt(int
index): this returns the element at the specified
index, or null if no such index exists.
- (1 point) public int size(): this
returns the number of elements in the array.
- (4 points) public int mergeSort:
this does a mergesort of the array. You can use the book's or
the class's mergesort code; the former is downloadable from
here. However,
you must modify the mergesort code from the book to work inside this
existing class and to return an int representing the number of "merges", i.e., the number of
elements copied from the source array to the target array during
merges. Additionally, mergesort needs to work with Strings,
not ints.
- (6 points) public int radixSort: this does
an (iterative) alphabetic radix sort of the array. The
strategy is similar, but not the same, as when sorting
numbers. First of all, there will be 27 groups (26
characters plus "too-short" words), not 10.
Second, words aren't "right-aligned", but rather
"left-aligned". In other words, you will start the
radix sort at the last character of the longest
String, but only words that are that long will
be grouped appropriately; all other words will be thrown
into the "zero" group that holds "too-short" words.
Future passes then go through every group and sort by
the second-to-last-longest character,
third-to-last-longest character, etc, and throw the
result into the appropriate new group (note that
you need a new "set" of groups for every pass!). Once
you get to the "zeroth" character, you will finally have
a configuration where there is no data in the "zero"
group, but data in the remaining groups are in
order. Read the groups starting with the "a" group,
and copy the elements back into the array.
You will use a doubly linked-list structure to
store each individual group, and you'll use an array
of linked lists to store the collection of groups.
Instead of having to modify the book's to handle a different datatype,
we'll do something different: you will use the
LinkedList class as supplied by Java in the
java.util package. (Note that this is the
only java.util data structure you should be
using for this assignment.)
Your radix sort method will return an integer: the
number of "operations" in and out of groups. That is,
any element inserted into a linked list acts as one
operation, and any element read out of a linked list
acts as another. Copying from one group into another
group acts as two operations. Add all of these up and
return it from the radix sort.
- (2 points) Implement an ArrayBackedListApp class
with a main() method that does the following. (There are 45,372 words in the array; you can
create static-sized arrays for the purposes of this
assignment.)
- Use a BufferedReader to read the words
from the aforementioned words.txt file into two new instances of ArrayBackedList;
- Mergesort the first instance and print out the number of merges.
- Radix sort the second instance and prints out the number of
assignments.
- Present a small user
interface (at a ">" prompt) with the following commands
(don't worry about invalid input).
- d1 count: Dumps the first
count elements to screen from the mergesort-sorted list. If count is
0, print all the elements to screen;
- d2 count: Dumps the first
count elements to screen from the radix-sorted list. If count is
0, print all the elements to screen;
- i1 index: Print out the
element referenced by index in the mergesort-sorted list. 0 would
imply the first element. If no such element
exists, print out not found.
- i2 index: Print out the
element referenced by index in the radix-sorted list. 0 would
imply the first element. If no such element
exists, print out not found.
- q: Quit.
Tips:
- The LinkedList documentation in Java is
here. It's not too difficult to use, although it's a little
different from the ones we used in class. First, it stores any type of
Object inside a LinkedList, so when you remove objects from the LinkedList,
you need to cast them as the datatype that you want to manipulate
(e.g., (String)list.removeFirst()).
Also, avoid using the get() method;
an arbitrary index access in a LinkedList, of course, is O(n), so doing lots
of get() operations will make your radix sort very slow.
Instead, use removeFirst, an iterator, or something similar.
- One of the tricky things is converting an individual character into an
array index to figure out which group a word is going into. Java makes
this much easier by supporting "character" math, e.g., given the appropriate
char in the variable called c, you can say radixGroups[c - 'a' + 1], which
makes 'a' == 1, 'b' == 2, etc. Remember the character in c should be
lowercase before you do this.
- Radix sort is easily the hardest part of this homework. Make sure
to strategize and fully understand what the homework is asking before
you attempt writing it. In fact, I suggest you do mergesort first
and get that working.
- Obviously, the d1/d2 and i1/i2 operations should print out the same
result if your sorts work -- they're there to enable the TAs to test each of
your sorts work correctly.
3 points extra credit: If you do this, make
sure to clearly indicate you've done so in your README.
You may have observed that, as stated above,
radix sort is rather inefficient -- we've got a few long
words for which we have to keep on scanning through lots and
lots of short words. Radix sort is best when we have words
with similar length, not with such a heterogeneous
collection as you might find in a dictionary. However,
there is a modification that will make radix sort faster
with a spelling dictionary:
- First, create a set of groups that are arranged by
length. You'll have m groups, one for words of
each possible length (where m is bounded by the maximum
length over all the words. In the first pass, you will
walk through the list of words and throw it into one of
these m groups based on length.
- Now, as you do the radix sort, start with the
mth character by grabbing all the words from the
group that has words that are m characters long,
and put them into the alphabetically-sorted groups.
(Note that you will no longer need the "0" group,
although you're welcome to leave it alone.) After
that's done, continue looping to the m-1th
character, grab the words from the m-1-length
group, and combine it with the words in the
alphabetically-sorted groups. Repeat this process over
and over until we reach (and finish) length 1 words, at
which point the alphabetic groups will have all the
words sorted. Make sure to update the code that handles
the number of group operations -- a read or write from
any kind of group should add to this total.
If you choose to do this, make sure to implement it in a
separate method (call it smartRadixSort), and
modify your ArrayBackedListApp code to load the
words into three separate arrays, sort them via mergesort, radix
sort, and smart radix sort, and display the # of comparisons for each.
(Remaining operations can use just the first array as specified earlier in
the homework.)