This project involves analyzing message headers such as From, Return-path, Received, List-Id and Message-Id which contain relevant information such as e-mail address, IP addresses, domain names of hosts, mailing list information and the message identifier. This data is used as input to several tests that are performed to help differentiate characteristics of spam from non-spam messages. The code operates on three sources of messages:
The classes MailHeaderParser
, SpamArchive
and IETF
represent the parsers operating on the above datasets respectively.
The project was coded in Java under Windows XP. It was modified slightly so that it would work under Linux. The MailHeaderParser
project requires the libraries activation.jar
and mail.jar
to compile and run. To be able to tests the hosts for reachability fping needs to be installed and it needs root access to execute.
To install the fping utility, type the following:
$ tar -xvf fping.tar
$ cd fping-2.4b2_to
$ ./configure
$ make
$ sudo make install
Extract the compressed project archive by typing:
$ tar -xvzf spam-analysis.tar.gz
Change into the project directory by typing:
$ cd spam-archive
To compile the CUNIX parser, type:
$ make buildcunix
To compile the IETF parser, type:
$ make buildietf
To compile the SpamArchive parser, type:
$ make buildsa
To compile all parsers, type:
$ make buildall
To execute the CUNIX parser, type:
$ make runcunix
To determine number of hosts reachable by ping on CUNIX parser, type:
$ fping -c 1 -f MailHeaderParser/cunix_folderName_ping 2>/dev/null | wc -l
To determine total number of hosts counted in the reachability test for the CUNIX parser, type:
$ wc -l MailHeaderParser/cunix_folderName_ping
To determine number of hosts reachable by ping on IETF parser, type:
$ fping -c 1 -f IETF/ietf_ping 2>/dev/null | wc -l
To determine total number of hosts counted in the reachability test for the IETF parser, type:
$ wc -l IETF/ietf_ping
To determine number of hosts reachable by ping on SpamArchive parser, type:
$ fping -c 1 -f SpamArchive/spam_archive_ping 2>/dev/null | wc -l
To determine total number of hosts counted in the reachability test for the SpamArchive parser, type:
$ wc -l SpamArchive/spam_archive_ping
To execute the IETF parser, type:
$ make runietf
To execute the SpamArchive parser, type:
$ make runsa
To delete the class files and generated output files for the CUNIX parser, type:
$make cleancunix
To delete the class files and generated output files for the IETF parser, type:
$make cleanietf
To delete the class files and generated output files for the SpamArchive parser, type:
$make cleansa
To delete the class files and generated output files for all parsers, type:
$make cleanall
All projects on execution, show debugging output on the standard output and output statistics, successive outcomes, failed outcomes and other output files in the project folder.
Detailed information about class members is available through the created javadoc files.
ip4:
type) since IPv6 isn't widely assigned or used in the SPF records.Further work may consist of the following enhancements:
I would like to thank Prof. Henning Schulzrinne for his guidance and continued support on this project.