Email System
This homework assignment—roughly speaking, to implement something like an email system—is the first of three linked assignments. Homework 4 will be to add some security features; for Homework 5, students get to attack each others' assignments…
The assignment consists of two programs. One is a mail "reception", parsing, and preparation program; the other does delivery. While the full reason for that will become more apparent in the next assignment, this is in fact close to how real email systems operate. In fact, real mail systems are often at least three components: a mail receiver (or multiple receivers, for different types of submission), a central processing/parsing section, and a "local delivery agent" or forwarder. I'm asking you to combine the the first two, but have a separate delivery agent.
You will note that this assignment seems light on security content and heavy on string processing. The latter is intentional—per the 10/15 lecture, secure string processing is not easy. In fact, mailers are notoriously hard to write securely. This assignment in fact has a lot of security content. Note well: if you wish to find, install, and use an open-source string-handling library, this is permissible. I am not requiring it, but it is acceptable. See the procedures discussed in the first class for using such packages.
Because of the requirement for interoperability, I'm specifying certain things that I might not otherwise require. First: all execution must take place in a directory that is empty except as specified here. This directory must have the following subdirectories: bin, mail, inputs, and tmp. It may have a lib subdirectory. All executables must be in the bin directory. The two executables you must supply are called mail-in and mail-out, the input/parser and the delivery agent. Any other run-time information they need must be in the lib directory, which you may populate as you wish.
The inputs directory has your test data. Each test input message is a single file; the filenames must be 5-digit numbers with leading zeros. To run each test case, the following four lines of shell will execute them in order:
for i in inputs/*
do
bin/mail-in <$i
done
which is the reason for the 5-digit numbers. The format of these test input files is described below.
The tmp directory is for any temporary files you may need to create; I don't know that you do need them, but I don't know that you don't. If you do, that's where they have to live. The layout—file names, subdirectories, etc.—is entirely up to you.
The last mandatory directory is mail. This directory, of course, holds the mailboxes. Each mailbox is itself a directory. All mailboxes must be created at the start of the test run. You must create these mailboxes and no others—these are random words taken from /usr/share/dict/words, and have no other semantic intent.
Received mail messages are individual files; each must also be given a 5-digit number. Note that numbering must be consecutive within each mailbox. If the sequence of test data involves sending a message to durwaun, one to repine, and then another to durwaun, the files that are created are durwaun/00001, durwaun/00002, and repine/00001.
The format of the input test files approximates what is received on a network mail connection; however, I've simplified it. An input file may contain more than one mail message. Messages are delimited by a period alone on a line. On all other lines, a leading period must be deleted; this means that a line containing only
..
will be output as a single period. A message consists of a MAIL FROM line, one or more RCPT TO lines, the body of the message, and a message-ending line of a single period. Even the last message in a file must have a period line at the end.
The first line must be of the form
MAIL FROM:<username>
where username is replaced by some valid mailbox name. If the username is invalid, the mail must be rejected. The two angle brackets must be present. MAIL FROM is case-insensitive.
There must be one or more RCPT TO lines:
RCPT TO:<username>
If the username is invalid, there should be an error message printed to stderr; however, a message is valid as long as there is at least one valid recipient and should be delivered. Again, RCPT TO is case-insensitive.
The control lines are ended by a line containing only the case-insensitive word
DATA
on a line.
Any other control lines, or control lines out of sequence, should generate an error message and a skip to the end-of-message indicator or end-of file.
An output message should consist of header lines, an empty line, and the body of the message exactly as received. Note that there are no limits on input line length. The format of the headers is:
From: username
To: username, username, ...
with no angle-brackets.
The deliverables are are the two executables, plus a script to create the directory tree. This script must be named create-tree; it must take one argument, the name of the tree. It should start
rm -rf "$1"
mkdir "$1"
to delete any existing tree of that name and create the new, empty directory. From there, go on to create the other mandatory directories. All references to files, directories, etc., in your script and code must be relative, not absolute.
As always, think hard about your test cases…
Note: it is legal for a tester to compile your code with the
-Wformat-overflow=0 -Wstringop-overflow=0
flags. These will turn off lots of the compiler protections against buffer overflows. I strongly suggest that you compile this way, too, when testing, so that you can see if you have any such flaws.