CS3134 Homework #4
Due on November 11, 2004 at 11:00am

There are two parts to this homework: a written component worth 10 points, and a programming assignment worth 15 points. See the homework submission instructions on how to hand it in and for important notes on programming style and structure.

Written questions

  1. (7 points) In class, we discussed a variety of very fast sorts (mergesort, radix sort, and soon quicksort).  There's another interesting sort that's actually faster than all three of these, and it's called bucket sort.  The general idea is that you establish a domain for the elements to be sorted; for example, if you are sorting an array of numbers which may range between 0 to 999, you create 1,000 buckets, arrange the elements into those buckets, and then pull them out of the buckets back into the array.
    1. (3 points) Describe an algorithm that we can use, given this basic idea, to sort the contents of an array of integers ranging from 1 to 999.
    2. (2 points) Analyze the worst-case running time of your bucket sort algorithm.  It should, for full credit, be faster than O(n lg n).  If it's not, beat on the algorithm a little more.
    3. (2 points) If it's so fast, why don't we use bucket sort all the time?  Well, there are several major downsides.  State at least one disadvantage of bucket sort as compared to the other sorts we've seen.
  2. (3 points; paraphrased from book programming 6.1) Suppose you had a language without a multiply operator.
    1. (2 points) Write a method called mult that takes a pair of ints, repeatedly calls itself with smaller operands, and uses addition to combine the results.  In other words, the method should have no loop constructs and should not use the multiply (*) operator anywhere.  Write it in Java; your syntax doesn't need to be perfect, but make it as close as you can.
    2. (1 point) Analyze the worst-case running time of this algorithm.  You can assume that addition is a constant-time (e.g., O(1)) operation.  State the result in terms of O(expression), and make sure to explain what how variable(s) in expression correspond to the original multiplication operands.

Programming problems

In this assignment, we're going to write a program that implements the beginnings of a dictionary abstraction, i.e., you will build data structures and tools to handle large numbers of words (Strings). You will use this file as input for the program; it's a scrambled version of a "dictionary" of words (without definitions) as distributed with a particular Linux distribution. (To download it in IE/Mozilla/Netscape, right-click and choose Save Target As or Save Link To Disk.) The file only contains words with alphabetic characters, both upper- and lower-case, one per line. You are to process the file, ignoring but preserving case, and support the operations as described below.  You will use an array-backed list to support this "dictionary".

  1. (13 points) Build an ArrayBackedList class that uses an array to store the words. The constructor for the class must take one parameter: the number of words to be stored, which serves as a capacity property for the array. You must then implement the following methods in your ArrayBackedList:
    1. (1 point) public boolean insert(String s): this takes a String and inserts it at the bottom of the (occupied part of the) array. It also updates an object-level variable called longestWord on each insert, so that by the end of input, longestWord contains the length of the longest word. You should return true unless the array is full, in which case return false.
    2. (1 point) public String elementAt(int index): this returns the element at the specified index, or null if no such index exists.
    3. (1 point) public int size(): this returns the number of elements in the array.
    4. (4 points) public int mergeSort: this does a mergesort of the array.  You can use the book's or the class's mergesort code; the former is downloadable from here.  However, you must modify the mergesort code from the book to work inside this existing class and to return an int representing the number of "merges", i.e., the number of elements copied from the source array to the target array during merges.  Additionally, mergesort needs to work with Strings, not ints.
    5. (6 points) public int radixSort: this does an (iterative) alphabetic radix sort of the array. The strategy is similar, but not the same, as when sorting numbers. First of all, there will be 27 groups (26 characters plus "too-short" words), not 10.

      Second, words aren't "right-aligned", but rather "left-aligned". In other words, you will start the radix sort at the last character of the longest String, but only words that are that long will be grouped appropriately; all other words will be thrown into the "zero" group that holds "too-short" words. Future passes then go through every group and sort by the second-to-last-longest character, third-to-last-longest character, etc, and throw the result into the appropriate new group (note that you need a new "set" of groups for every pass!). Once you get to the "zeroth" character, you will finally have a configuration where there is no data in the "zero" group, but data in the remaining groups are in order. Read the groups starting with the "a" group, and copy the elements back into the array.

      You will use a doubly linked-list structure to store each individual group, and you'll use an array of linked lists to store the collection of groups. Instead of having to modify the book's to handle a different datatype, we'll do something different: you will use the LinkedList class as supplied by Java in the java.util package. (Note that this is the only java.util data structure you should be using for this assignment.)

      Your radix sort method will return an integer: the number of "operations" in and out of groups. That is, any element inserted into a linked list acts as one operation, and any element read out of a linked list acts as another. Copying from one group into another group acts as two operations. Add all of these up and return it from the radix sort.
  2. (2 points) Implement an ArrayBackedListApp class with a main() method that does the following.  (There are 45,372 words in the array; you can create static-sized arrays for the purposes of this assignment.)
    1. Use a BufferedReader to read the words from the aforementioned words.txt file into two new instances of ArrayBackedList;
    2. Mergesort the first instance and print out the number of merges.
    3. Radix sort the second instance and prints out the number of assignments.
    4. Present a small user interface (at a ">" prompt) with the following commands (don't worry about invalid input).
      • d1 count: Dumps the first count elements to screen from the mergesort-sorted list. If count is 0, print all the elements to screen;
      • d2 count: Dumps the first count elements to screen from the radix-sorted list. If count is 0, print all the elements to screen;
      • i1 index: Print out the element referenced by index in the mergesort-sorted list. 0 would imply the first element. If no such element exists, print out not found.
      • i2 index: Print out the element referenced by index in the radix-sorted list. 0 would imply the first element. If no such element exists, print out not found.
      • q: Quit.

Tips:

 

3 points extra credit: If you do this, make sure to clearly indicate you've done so in your README. 

You may have observed that, as stated above, radix sort is rather inefficient -- we've got a few long words for which we have to keep on scanning through lots and lots of short words. Radix sort is best when we have words with similar length, not with such a heterogeneous collection as you might find in a dictionary. However, there is a modification that will make radix sort faster with a spelling dictionary:

  1. First, create a set of groups that are arranged by length. You'll have m groups, one for words of each possible length (where m is bounded by the maximum length over all the words. In the first pass, you will walk through the list of words and throw it into one of these m groups based on length.
  2. Now, as you do the radix sort, start with the mth character by grabbing all the words from the group that has words that are m characters long, and put them into the alphabetically-sorted groups. (Note that you will no longer need the "0" group, although you're welcome to leave it alone.) After that's done, continue looping to the m-1th character, grab the words from the m-1-length group, and combine it with the words in the alphabetically-sorted groups. Repeat this process over and over until we reach (and finish) length 1 words, at which point the alphabetic groups will have all the words sorted. Make sure to update the code that handles the number of group operations -- a read or write from any kind of group should add to this total.

If you choose to do this, make sure to implement it in a separate method (call it smartRadixSort), and modify your ArrayBackedListApp code to load the words into three separate arrays, sort them via mergesort, radix sort, and smart radix sort, and display the # of comparisons for each. (Remaining operations can use just the first array as specified earlier in the homework.)