In Standalone approach, email messages from the existing mailboxes were used for analysis. A Graphical User Interface was used to perform tests. The user is asked to select and categorize the folders in the mailbox as spam folders, non spam folders and sent folders. Then all the message headers of the messages from the selected folders were retrieved using IMAP and the tests were performed on those messages to obtain corresponding results. The tests were performed on 22 different mailboxes from Gmail and Cubmail (Columbia University's Email Service) which consisted about 25,000 non spam messages and about 3000 spam messages.
Following are the statistics of the mailboxes that were used to perform tests:
MailBox # | MailBox | # of Mails | # of Non Spam mails | # of Spam mails |
1 | aditi_columbia | 1818 | 1818 | 0 |
2 | aditi_gmail | 593 | 497 | 96 |
3 | deepti_columbia | 1174 | 1174 | 0 |
4 | deepti_gmail | 641 | 576 | 65 |
5 | dhrumin_gmail | 5105 | 5002 | 103 |
6 | pinank_gmail | 1682 | 1418 | 264 |
7 | Preetinarayan_columbia | 1230 | 1230 | 0 |
8 | Preetinarayan_gmail | 1992 | 1788 | 204 |
9 | sneha_gmail | 360 | 133 | 227 |
10 | spinank_gmail | 879 | 524 | 355 |
11 | vasa_columbia | 168 | 168 | 0 |
12 | dms2169_columbia | 1322 | 1301 | 21 |
13 | nirav_gmail | 1408 | 1360 | 48 |
14 | nns_2108 | 934 | 934 | 0 |
15 | manish_gmail | 459 | 414 | 45 |
16 | pragni_gmail | 2183 | 1999 | 184 |
17 | preetimalik_columbia | 527 | 527 | 0 |
18 | preetimalik_gmail | 380 | 380 | 0 |
19 | sak2144 | 749 | 749 | 0 |
20 | shradha_columbia | 140 | 140 | 0 |
21 | shradha_gmail | 1522 | 1151 | 371 |
22 | vasa_gmail | 3316 | 2370 | 946 |
Total | 28582 | 25653 | 2929 |
Following are the tests and observations that were run on the above mailboxes and will be discussed in this report:
2.1 Email Source Analysis
2.2 Attachment Analysis
Next: Email Source Analysis
Last updated: 2008-08-19 by Nirav Shah