Email System
This homework assignment—roughly speaking, to implement something like an email system—is the first of three linked assignments. Homework 4 will be to add some security features; for Homework 5, students get to attack each others' assignments…
The assignment consists of two programs. One is a mail "reception", parsing, and preparation program; the other does delivery. While the full reason for that will become more apparent in the next assignment, this is in fact close to how real email systems operate. In fact, real mail systems are often at least three components: a mail receiver (or multiple receivers, for different types of submission), a central processing/parsing section, and a "local delivery agent" or forwarder. I'm asking you to combine the the first two, but have a separate delivery agent.
You will note that this assignment seems light on security content and heavy on string processing. The latter is intentional—per the 10/15 lecture, secure string processing is not easy. In fact, mailers are notoriously hard to write securely. This assignment in fact has a lot of security content. Note well: if you wish to find, install, and use an open-source string-handling library, this is permissible. I am not requiring it, but it is acceptable. See the procedures discussed in the first class for using such packages.
Because of the requirement for interoperability, I'm specifying certain things that I might not otherwise require. Again: this is an interoperability requirement. I should be able to use A's test script, B's executable for mail-in, C's mail-out, and D's test data. Why? Because of hw5. First: all execution must take place in a directory that is empty except as specified here. This directory must have the following subdirectories: bin, mail, inputs, and tmp. It may have a lib subdirectory. All executables must be in the bin directory. The two executables you must supply are called mail-in and mail-out, the input/parser and the delivery agent. Any other run-time information they need must be in the lib directory, which you may populate as you wish.
To install your executables in the bin directory, it's simplest to let make do it. Here's how.
In your Makefile, have a rule like
install: mail-in mail-out
cp mail-in mail-out $(DEST)
And then do
make DEST=destination-directory
The inputs directory has your test data. Each file has one or more messages; the filenames must be 5-digit numbers with leading zeros. To run each test case, the following four lines of shell will execute them in order:
for i in inputs/*
do
bin/mail-in <$i
done
which is the reason for the 5-digit numbers. The format of these test input files is described below.
The tmp directory is for any temporary files you may need to create; I don't know that you do need them, but I don't know that you don't. If you do, that's where they have to live. The layout—file names, subdirectories, etc.—is entirely up to you.
The last mandatory directory is mail. This directory, of course, holds the mailboxes. Each mailbox is itself a directory. All mailboxes must be created at the start of the test run. You must create these mailboxes and no others—these are random words taken from /usr/share/dict/words, and have no other semantic intent.
Do not have a separate list of recipient names; it's bad style. (And reading a directory is pretty easy.)
It is reasonable to assume that mailbox names do not contain \n, \0, <, >, or /. If you wish to restrict it more, you must support the following characters: upper and lower case letters, digits, +, -, and _. It's also reasonable to insist that the first character must be a letter.
Received mail messages are individual files; each must also be given a 5-digit number. Note that numbering must be consecutive within each mailbox. If the sequence of test data involves sending a message to durwaun, one to repine, and then another to durwaun, the files that are created are durwaun/00001, durwaun/00002, and repine/00001.
The format of the input test files approximates what is received on a network mail connection; however, I've simplified it. An input file may contain more than one mail message. Messages are delimited by a period alone on a line. On all other lines, a leading period must be deleted; this means that a line containing only
..
will be output as a single period. A message consists of a MAIL FROM line, one or more RCPT TO lines, the body of the message, and a message-ending line of a single period. Even the last message in a file must have a period line at the end.
The first line must be of the form
MAIL FROM:<username>
where username is replaced by some valid mailbox name. If the username is invalid (and mail-in must check this), the mail must be rejected. The two angle brackets must be present. MAIL FROM is case-insensitive.
There must be one or more RCPT TO lines:
RCPT TO:<username>
If the username is invalid, there should be an error message printed to stderr; however, a message is valid as long as there is at least one valid recipient and should be delivered. Again, RCPT TO is case-insensitive. Only mail-out can validate the legitimacy of a recipient; mail-in does not know.
Blank control lines should generate an error.
The control lines are ended by a line containing only the case-insensitive word
DATA
on a line. You may, if you wish, limit message size to 1 GB.
Any other control lines, or control lines out of sequence, should generate an error message and a skip to the end-of-message indicator or end-of file. Note: end-of-message is a valid place to skip to as well; just because one email in a batch is invalid doesn't mean you can flush the rest of the batch.
An output message should consist of header lines, an empty line, and the body of the message exactly as received. Note that there are no limits on input line length. The format of the headers is:
From: username
To: username, username, ...
with no angle-brackets. A message with no body is acceptable.
The deliverables are are the two executables, the source code, the test script, and a script to create the directory tree. This script must be named create-tree; it must take one argument, the name of the tree. It should start
rm -rf "$1"
mkdir "$1"
to delete any existing tree of that name and create the new, empty directory. From there, go on to create the other mandatory directories. All references to files, directories, etc., in your script and code must be relative, not absolute.
The mail delivery agent, mail-out, actually writes messages to the mailbox. It takes exactly one argument, the recipient; the mail message itself is read from stdin and is terminated by end-of-file. Note carefully that the output message has to be prepared by mail-in: only it knows all of the recipients. However, mail-out is the only program that can know if a recipient is valid. Furthermore, it may not print an error message; only mail-in may do that. You therefore need some way to signal back from mail-out to mail-in. The easiest way to do that is via an error return value from mail-out, that is, the argument to exit() or the value returned from main(). (Why this rule? This is an emulation of a real mail system, where the mail delivery agent does not have direct contact with the sender. At best, it could mail back a message. But don't do that here.)
Although in this situation, only mail-in can call mail-out, you can't make that assumption. In an attack scenario, i.e., in hw5, the attacker might try to invoke it. It needs to be safe or (for hw5, not this assignment) you have to ensure that no one else can run it. Similarly, for this assignment mail-out does not need to verify the contents of the input file.
As always, think hard about your test cases… Remember that the tester may try to trigger buffer overflows.
Note: it is legal for a tester to compile your code with the
-Wformat-overflow=0 -Wstringop-overflow=0
flags. These will turn off lots of the compiler protections against buffer overflows. I strongly suggest that you compile this way, too, when testing, so that you can see if you have any such flaws.