CS3134 Homework #5
Due on Tuesday, November 30, 2004 at 11:00am

There are two parts to this homework: a written component worth 12 points, and a programming assignment worth 13 points. See the homework submission instructions on how to hand it in and for important notes on programming style and structure.

Written questions

  1. (9 points) You're given the following list of numbers to work with.
    48, 21, 45, 1, 93, 87, 55, 100, 34, 97
    1. (1 point) Insert the numbers into a binary search tree.
    2. (1 point) Using the book's convention, draw the resulting tree when 48 is deleted.
    3. (1 point) Using the inorder predecessor instead of the inorder successor, draw the resulting tree when 48 is deleted.
    4. (1 point) Insert the numbers into an 11-element-array-backed hash table using linear probing.  Use the hash function key % 11.  Draw the resulting table.
    5. (3 points) Insert the numbers into an 13-element array-backed hash table using double hashing. The first hash function is key % 13, and the second hash function is 5 - (key % 5). Draw the resulting table.
    6. (2 points) From (e), state the number of initial collisions (e.g., using the first hash function), and compute the average "found" probe length (e.g., find the number of steps to find each of the numbers in that list in the double hashing hash table, and average these over the set of 10 numbers).
  2. (3 points) You are to work out the steps to generate a Huffman code for the string "SUNNING IN MISSISSIPPI" (without quotes).
    1. (2 points) Using the algorithm outlined in the book and in class, create a Huffman tree for this String. Make sure to show the steps involved in the Huffman tree's creation (e.g., starting with singleton trees in a priority queue).
    2. (1 points) Given the Huffman tree in (a), encode the aforementioned string. If we would ordinarily use 7 bits per character, what's the savings in total # of bits?

Programming problem

The goal of this programming exercise is to develop a search tool for email using a tree-based dictionary structure.  There are two major parts to this assignment: modifying the Tree data structure to support this data, and to write an App that parses an email folder file and inserts the relevant data into the tree, thereby allowing lookup.  The keys that will be used are every word in an email body, and the associated value will be the email header.  In other words, you'll build the email header from the header fields, and then will insert it against every word key.  Since a word may be contained in multiple emails, we'll use a linked list to store a list of email headers associated with each key word.

The mail format that we're going to read is the UNIX mbox format - this is the format your mail is stored in if you use CUBMail or Pine.  If you look on your CUNIX account, there may be a mail folder containing all your mail, with one file per folder.  Your INBOX isn't stored there, though -- that may be stored in a special file called mbox in your homedirectory, or may be located in a more esoteric location.  You're welcome to test your code against your mail, but I've also provided a sample mailbox here -- it's a collection of the public emails I've gotten from Dean Zvi Galil, of SEAS, from the last year or so.

The mbox format can be described as follows (look at the sample mailbox I've provided).  For a full technical description, try typing "man 5 mbox" on CUNIX.

So, what are you going to do with this?

  1. (2 points) Write an EmailHeader class that will store four Strings: the From address, the To address, the Subject, and the Date.  Note that you can store the date as a String without any side-effects for this assignment.  Also write a constructor that takes those four parameters and stores them, and a toString method that generates out a String representation of the email header (recommendation: use a tab whitespace character, e.g., "\t", between each field so that they appear nicely in columns when the EmailHeaders are later printed out).
  2. (4 points) Modify the Tree code as supplied with the book (downloadable from here) as follows.
  3. (7 points) Write a EmailSearcher app class that reads mail files, inserts the matches into the tree, and lets us search for the results.  You will only need one main method.
  4. (2 points extra credit) Handle punctuation by stripping it out before inserting it into the tree, so that the limitation as described at the beginning of this section is no longer an issue.  Make sure to indicate in your README if you have done so.

Tips: