|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object spamarchive.SpamArchive
public class SpamArchive
SpamArchive is the main class that is used to parse e-mail headers of recent messages from the SpamArchive.org website. This data has been preprocessed to remove errors and deviation from specifications wherever necessary.
Constructor Summary | |
---|---|
SpamArchive()
Creates a new instance of SpamArchive and initializes the member variables. |
Method Summary | |
---|---|
void |
checkDSL()
Checks if the source IP address of the message is statically or dynamically assigned by querying the SORBS DUHL. |
int |
checkSPFExists()
Queries the DNS records for existence of a SPF record belonging to the domain of the earliest Received host. |
int |
checkSPFMatches()
Checks if the domain argument's SPF record permits messages
to be sent from a host whose address is the ipAddress argument. |
boolean |
checkWellFormed()
Checks whether the current message is well formed. |
void |
clearHeaders()
Clears relevant message headers before a new message is about to be parsed. |
java.lang.String |
expandBinary(java.lang.String str)
Expands the binary str argument to an eight bit binary string. |
java.lang.String |
expandMacro(java.lang.String inputStr,
java.lang.String ipAddress,
java.lang.String domain)
Performs macro expansion on the inputStr argument according to
section 8 of RFC 4408. |
java.lang.String |
expandMacroTerm(java.lang.String macroTerm,
java.lang.String ipAddress,
java.lang.String domain)
Expands the macroTerm argument as specified by
section 8 of RFC 4408. |
java.util.Date |
getDate(java.lang.String str)
Retrieves the timestamp of the Received header passed as its argument. |
java.lang.String |
getEarliestReceivedHost()
Gets the first host that adds its Received header to the
message. |
java.lang.String |
getSourceIP(boolean publicIP)
Gets the IP address of the machine from which the message was sent. |
boolean |
isCountryCodeTLD(java.lang.String str)
Determines whether the domain component represented by the str
argument is a country code top level domain. |
boolean |
isGenericTLD(java.lang.String str)
Determines whether the domain component represented by the str
argument is a generic top level domain. |
boolean |
isPrivateAddress(java.lang.String address)
Checks whether the address argument is a private address. |
static void |
main(java.lang.String[] args)
The main entry point into the SpamArchive class. |
java.lang.String |
normalize(java.lang.String str)
Removes redundant spaces and other unwanted symbols from the Received
headers. |
void |
printSPFFailure(java.lang.String reason,
java.lang.String domain,
java.lang.String spfRecord)
Prints relevant message headers of messages to analyze cases where SPF verification against the source IP address failed and also what lead to the failure. |
void |
printSPFSuccess(java.lang.String reason,
java.lang.String domain,
java.lang.String spfRecord)
Prints relevant message headers of messages to analyze cases where SPF verification against the source IP address succeeds and also what lead to the success. |
void |
printStatistics()
Prints relevant statistics to standard output and to the file spam_archive_statistics after processing every input data file. |
void |
processFiles()
Reads data files from the SpamArchive.org website line by line and extracts values from relevant headers into its member variables. |
java.lang.String |
reverseMacro(java.lang.String text)
Reverses the representation of the given text splitting at dot boundaries. |
void |
sortReceivedHeaders()
Reverses the order of the Received headers so that they are
arranged from earliest to last. |
int |
spfLookUp(java.lang.String ipAddress,
java.lang.String domain)
Performs verification of the ipAddress argument against the
SPF record of the domain argument. |
boolean |
testIpMatch(java.lang.String ipAddress,
java.lang.String ipAddressRange)
Checks whether the ipAddress argument falls in the range of the
ipAddressRange argument. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SpamArchive() throws java.lang.Exception
SpamArchive
and initializes the member variables.
java.io.FileNotFoundException
- if the file exists but is a directory
rather than a regular file, does not
exist but cannot be created, or cannot
be opened for any other reason
javax.naming.NamingException
- if a naming exception is encountered
java.lang.Exception
Method Detail |
---|
public void clearHeaders()
public void printStatistics()
spam_archive_statistics
after processing every input data file.
public void printSPFFailure(java.lang.String reason, java.lang.String domain, java.lang.String spfRecord)
reason
- the reason that led to SPF verification failure.domain
- the domain currently being looked up.spfRecord
- the SPF Record of the current domain.public void printSPFSuccess(java.lang.String reason, java.lang.String domain, java.lang.String spfRecord)
reason
- the reason that led to SPF verification success.domain
- the domain currently being looked upspfRecord
- the SPF Record of the current domain.public java.util.Date getDate(java.lang.String str) throws java.lang.Exception
Received
header passed as its argument.
str
- the Received
header.
Date
object
corresponding to the timestamp
present in str
in the
format "d MMM yyyy HH:mm:ss Z"
;null
if the timestamp in
the Received
header
doesn't match the regex pattern.
java.util.regex.PatternSyntaxException
- The regular expression's syntax is invalid.
java.lang.IllegalArgumentException
- The pattern describing the date and
time format is invalid.
InvalidStateException
- if no match has yet been attempted
or the previous match operation has
failed.
java.lang.Exception
public java.lang.String getEarliestReceivedHost() throws java.lang.Exception
Received
header to the
message.
Received
header to this message."by missing"
if the earliest
Received
header doesn't contain "by"
;"received missing"
if no Received
headers exist.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax is invalid.
java.lang.Exception
public java.lang.String getSourceIP(boolean publicIP) throws java.lang.Exception
publicIP
- flag indicating whether a public IP
address is strictly necessary.
"received missing"
if no
Received
headers are found or none
contain the source IP address.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax
is invalid.
java.lang.IllegalStateException
- if no match has yet been attempted
or the previous match operation
failed.
java.lang.IndexOutOfBoundsException
- if there is no capturing group in
the pattern with the given index.
java.lang.Exception
public boolean isPrivateAddress(java.lang.String address) throws java.lang.Exception
address
argument is a private address.
address
- the IP address that needs to be checked.
true
if address
argument is
a private address;false
otherwise.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax
is invalid.
java.lang.NumberFormatException
- if the string doesn't contain a
parseable integer.
java.lang.Exception
public boolean checkWellFormed() throws java.lang.Exception
Received
host.From
header of such messages is typically the e-mail address of the mailing list.Received
headers or the "by"
part
of the earliest Received
header.
true
if none of the above conditions are true;false
otherwise.
java.lang.Exception
public java.lang.String normalize(java.lang.String str)
Received
headers.
str
- the Received
header to be normalized.
Received
header.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax is
invalid.public void sortReceivedHeaders() throws java.lang.Exception
Received
headers so that they are
arranged from earliest to last.
java.lang.Exception
public boolean isGenericTLD(java.lang.String str)
str
argument is a generic top level domain.
str
- the domain component to be checked.
true
if str
is a generic TLD;false
otherwise.public boolean isCountryCodeTLD(java.lang.String str)
str
argument is a country code top level domain.
str
- the domain component to be checked.
true
if str
is a country code TLD;false
otherwise.public int checkSPFExists() throws java.lang.Exception
Received
host.
-6
if the ServiceUnavailableException
is
encountered;-5
if the InvalidNameException
is encountered;-4
if the NameNotFoundException
is encountered;-3
if the CommunicationException
is encountered;-2
if the earliest Received
host has an
invalid top level domain;-1
if the domain hasn't been encountered previously
and lacks a SPF record in its DNS entry;0
if the domain has been encountered previously and
lacks a SPF record in its DNS entry;1
if the domain hasn't been encountered previously
and it has a SPF record in its DNS entry;2
if the domain has been encountered previously and
it has a SPF record in its DNS entry.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax
is invalid.
javax.naming.CommunicationException
- if the client cannot communicate
with the server and a timeout occurs.
javax.naming.NameNotFoundException
- if a component of the name cannot be
resolved because it is not bound.
javax.naming.InvalidNameException
- if the name doesn't conform to the
naming syntax of the naming system.
javax.naming.ServiceUnavailableException
- if the directory or naming service is
unavailable.
java.lang.Exception
public java.lang.String expandBinary(java.lang.String str)
str
argument to an eight bit binary string.
str
- the binary string that is to be expanded.
public boolean testIpMatch(java.lang.String ipAddress, java.lang.String ipAddressRange) throws java.lang.Exception
ipAddress
argument falls in the range of the
ipAddressRange
argument.
ipAddress
- the IP address to be tested.ipAddressRange
- the range of IP addresses that the
ipAddress
argument is to be checked
against. The range can be either an IP
address or an IP address with a network
prefix (CIDR notation).
true
if the ipAddress
argument
falls in the ipAddressRange
argument's
range;false
otherwise.
java.lang.NumberFormatException
- if the string doesn't contain a parseable
integer.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax is
invalid.
java.lang.Exception
public java.lang.String reverseMacro(java.lang.String text)
For example, if the text
argument is "aw.bx.cy.dz", the text returned is
"dz.cy.bx.aw".
text
- the text to be reversed.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax is
invalid.public java.lang.String expandMacroTerm(java.lang.String macroTerm, java.lang.String ipAddress, java.lang.String domain) throws java.lang.Exception
macroTerm
argument as specified by
section 8 of RFC 4408.
The following macro letters are expanded in term arguments:
s
- the sender's email addressl
- the local-part of the sender's e-mail addresso
- the domain-part of the sender's e-mail addressd
- the domain-part of the sender's e-mail addressi
- the IP address of the SMTP client.p
- the validated domain name of the SMTP client's IP address.v
- "in-addr"
if the IP address is ipv4;"ip6"
if
the IP address is ipv6.h
- the domain name supplied on HELO/EHLO, normally the domain name
of the sending SMTP server.c
- the IP address of the SMTP client.r
- the domain name of the host performing the check, normally
the receiving MTA.t
- current timestamp (number of seconds since Midnight,
January 1, 1970, UTC).
macroTerm
- the macro term to be expanded.ipAddress
- the IP address of the host where the current message originated.domain
- the domain part of the Return-Path
header.
"invalid headers"
if the Return-path
contains a syntax error;"null"
if PTR query doesn't succeed.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax is
invalid.
java.lang.IllegalStateException
- if no match has yet been attempted or
the previous match operation failed.
javax.naming.CommunicationException
- if the client is unable to communicate
with the directory or naming service.
javax.naming.NameNotFoundException
- if a component of the name cannot be
resolved, because it is not bound.
java.lang.Exception
public java.lang.String expandMacro(java.lang.String inputStr, java.lang.String ipAddress, java.lang.String domain) throws java.lang.Exception
inputStr
argument according to
section 8 of RFC 4408.
inputStr
- the text to be macro-expanded.ipAddress
- the IP address of the host where
the current message originated.domain
- the domain part of the Return-Path
header.
inputStr
if successful;"invalid headers"
if the
Return-path
contains a syntax error;"null"
if the PTR query doesn't succeed.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax is invalid.
java.lang.Exception
public int spfLookUp(java.lang.String ipAddress, java.lang.String domain) throws java.lang.Exception
ipAddress
argument against the
SPF record of the domain
argument.
ipAddress
- the IP address of the host from
where the current message originated.domain
- the domain-part of the
Return-path
header.
-6
if infinite recursion
is avoided due to the SPF record of
the domain
argument containing
itself as an included domain;-5
if the CommunicationException
is encountered;-4
if the NameNotFoundException
is encountered;-3
if the InvalidNameException
is encountered;-2
if the
ServiceUnavailableException
is
encountered;-1
if the domain
argument
doesn't have a SPF record in its DNS
entry;0
if the SPF record for the
domain
argument doesn't permit
messages from the ipAddress
argument;1/10
if the SPF record for the
domain
argument permits messages
from the ipAddress
argument.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax
is invalid.
javax.naming.CommunicationException
- if the client cannot communicate
with the server and a timeout occurs.
javax.naming.NameNotFoundException
- if a component of the name cannot be
resolved because it is not bound.
javax.naming.InvalidNameException
- if the name doesn't conform to the
naming syntax of the naming system.
javax.naming.ServiceUnavailableException
- if the directory or naming service is
unavailable.
java.lang.Exception
public int checkSPFMatches() throws java.lang.Exception
domain
argument's SPF record permits messages
to be sent from a host whose address is the ipAddress
argument.
-5
if getSourceIP
returns "NULL"
;-4
if Received
headers are missing;-3
if SPF verification
test couldn't be performed;-2
if the From
header's syntax is invalid;-1
if the Return-path
header's syntax is invalid;0
if the SPF record for the
domain
argument doesn't permit
messages from the ipAddress
argument;1
if the SPF record for the
domain
argument permits messages
from the ipAddress
argument;
java.util.regex.PatternSyntaxException
- if the regular expression's syntax
is invalid.
java.lang.IllegalStateException
- if no match has yet been attempted
or the previous match operation
failed.
java.lang.Exception
public void checkDSL() throws java.lang.Exception
It also outputs the source IP addresses to the file spam_archive_ping
which is later used as input to fping to
determine which of the hosts are reachable via ping.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax
is invalid.
javax.naming.CommunicationException
- if the client cannot communicate
with the server and a timeout occurs.
javax.naming.NameNotFoundException
- if a component of the name cannot be
resolved because it is not bound.
javax.naming.InvalidNameException
- if the name doesn't conform to the
naming syntax of the naming system.
javax.naming.ServiceUnavailableException
- if the directory or naming service is
unavailable.
java.lang.Exception
public void processFiles() throws java.lang.Exception
printStatistics()
is called. The process
then repeats for all the files in the input data set.
java.lang.SecurityException
- if a security manager exists and its
SecurityManager.checkRead(String)
method denies read access to the directory.
java.io.FileNotFoundException
- if the file does not exist or is a
directory rather than a regular file.
java.io.IOException
- if an I/O error occurs.
java.util.regex.PatternSyntaxException
- if the regular expression's syntax is invalid.
java.lang.Exception
public static void main(java.lang.String[] args) throws java.lang.Exception
SpamArchive
class. Creates an instance of this
class and performs the following tasks:
args
- the command line arguments
java.lang.Exception
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |