Email Notification System (ENS)
Aleksandr Voskoboynik
Columbia University
New York, NY 10027
USA
av69@columbia.edu
Yuliya Averbukh
Columbia University
New York, NY 10027
USA
ya39@columbia.edu
Abstract
Email Notification System (ENS) is a convenient way for a Web
user to keep up with the changes in the Web sites of his interest
without having to check them regularly. It is also a good way
for a Web site owner to make sure that his updates don't go unnoticed.
Any Web site owner interested in participating in the service
can put a link to the CGI script in his page. Having followed
this link, a user subscribes for the service by completing a simple
form with his name and email address. Any one user can subscribe
to unlimited number of Web sites. Having done so, he will be notified
regularly (daily or weekly) about the changes in all of the sites
of his interest. To fully or partially unsubscribe from the service,
a user can click on the link located in each of the notification
reports he receives.
Introduction
Our system is easy to install and use; it is scaleable, efficient
and reliable. Below we outline the steps needed for both Web site
owners and users in order to use ENS.
A Web site owner who is interested in publicizing the changes
made to his directory puts a link on his HTML page to the CGI
script called subscribe.cgi. The script takes the URL as an argument.
Note that all the directories located under the specified one
will be checked for changes as well.
A user clicks on the link located on the site of his interest.
He is then asked to fill out a form which contains the following
fields: URL, user email address and user name. The form will not
be processed unless all of the fields are completed and valid.
URL field is automatically filled by CGI script with owner-specified
URL. However, this field can be changed by the user. A user also
chooses the frequency with which he would like to be getting the
modification reports for this site. Currently he has a choice
of being informed on daily or weekly basis. It is possible for
the same user to choose "weekly" option for some Web
sites and "daily" option for the others.
The notification system is initiated by "daily" cron
job for daily updates and by "weekly" cron job for the
weekly ones. If at least one of the sites of user interest has
been modified, the notification message will be sent out. The
file modification will be announced if the fraction change in
size is >= than the one defined by the page owner.
The system sends a nicely formatted email notification message
to the user which contains individual reports on modified sites
which are of interest to the subscriber. Report for a site contains
links to the newly created or modified pages, with the HTML page
title used to identify the page. The message footer contains the
link to unsubscribe from the service.
"Daily" subscribers get update reports every morning if at least one of the Web sites of their interest is changed. "Weekly" subscribers get notified on Monday mornings. The reports conform to the following format for each of the changed sites.
<title> - <URL>
<action/modification time>
..................
<title> - <URL>
<action/modification time>
where action tells the user whether the file was created or modified.
In addition, there is an unsubscribe link at the end of each report.
Since majority of e-mail viewers recognize URLs and make them
clickable, this allows the subscriber to visit the updated Web
pages by clicking on the URLs in the email message. This will
also allow the user to unsubscribe from the service by following
the link at the bottom of the message.
After a user clicks on an unsubscribe link located at the end
of his Update Report, a CGI script generates a form which lists
all of the URLs for which this user is subscribed. There is a
checkbox corresponding to each URL. Originally, all the boxes
are checked. To unsubscribe from any site, a user should uncheck
the box. If all the boxes are left checked, the user will remain
subscribed for all of the URLs in the list. If none of the boxes
remain checked, the user unsubscribes from the service all together.
Architecture
ENS maintains two databases, to keep track of daily and weekly subscriptions, respectively. Daily information is stored in "daily" subdirectory, and weekly information is stored in "weekly" subdirectory. Each line of the database contains the following information:
<user email>::<name>::<URL> ::
:: <URL>
ENS also automatically creates state files for each of the URLs
requested by users. Depending on the frequency requested, the
files go to either "daily" or "weekly" subdirectory.
For each URL, such file stores the information about the current
state of the corresponding directory (and all of its subdirectories).
This information includes size of each file in the directory and
its modification time. To see whether a particular site has been
modified, it is enough to compare the newly obtained state with
the one stored in the corresponding file.
Note: All of the following modules have been written in
Perl.
This module sends a formatted email notification message to the
user which contains individual reports on modified sites of interest
to the subscriber. Report for a site contains links to the newly
created or modified pages, with the HTML page title used to identify
the page. The message footer contains the link to unsubscribe
from the service. The file modification will be announced if the
fraction change in size is >= than the one defined by the page
owner.
The check for modifications is accomplished by matching the old
state file, which contains the old state of the directory (and
all of its subdirectories), against the new one. The state files
are produced by "find" utility with -ls and -follow
options. Once the site has been processed, the old state file
is replaced by the new one. A number of users might be subscribed
to the same URL, therefore the system is optimized to check whether
the site was already processed for another user before it proceeds
to match the state files.
The title of the page is the string enclosed in the tags <TITLE>
and </TITLE> (tags can have lower and upper case characters).
If the page doesn't have a title, the default "Untitled"
is used.
X-Email_Notification_To header is added to the notification message
for the purpose of filtering of bounced email.
This module adds the necessary information about a particular user to one of the two databases we maintain. Two CGI scripts, "subscribe.cgi" and "add.cgi", work together to accomplish this task.
This script uses GET method to obtain the URL for which user wants to subscribe. This allows the Web site owner to specify the URL to which the user should subscribe. It then displays a form to be completed by the user, and automatically puts the argument it receives in the URL field. However, a user can change it to any other URL. User email and name are required. A user also has a choice of notification frequency (daily or weekly). JavaScript functions from "FormChek.js" library are used to check the validity of fields. The form gets submitted only when all its fields are completed and valid. Submitted form is fed to another CGI script, "add.cgi".
This script performs the actual addition of the user-specified information to the database. It uses POST method to obtain the information from the form and processes this information in the following way: depending on user-requested frequency, it adds the user to either daily or weekly database. If the user is already in a database, only the new URLs will be added to the existing record. For each URL added, the script stores the information about the current state of the corresponding directory (and all of its subdirectories), such as file names, their sizes and modification times.
This module contains scripts that deal with the process of unsubscribing the user from the service. A user can unsubscribe partially (if he still wishes to be notified about some sites, but not the others), or fully. If the user is no longer subscribed for any sites, his name is altogether deleted from the database. This is accomplished by the two CGI scripts, "unsubscribe.cgi" and "delete.cgi".
This script is called as the user clicks on a link at the bottom of the notification message. It receives a user email as its argument (using GET method). It then searches both "daily" and "weekly" databases and displays the form that lists all the URLs for which this user is subscribed. There is a checkbox corresponding to each URL, and it is checked by default. In order to unsubscribe from any of the URLs, a user has to uncheck the box(s). When the user submits the form, "delete.cgi" script is called.
This script receives the form (using POST method), and removes
from the database all the URLs in which the user is no longer
interested. If there are no more URLs in which the user is interested,
the entry for this user is deleted from the database. This script
is also responsible for checking, for each URL deleted from the
database, whether any other user is interested in this site. If
not, the file which contains information about this URL is also
deleted, as part of the cleanup routine.
This module takes care of bounced email messages. It works in
collaboration with the procmail filter, i.e. it is called by procmail
on bounced messages which have been originally produced by the
notification system. Procmail identifies those messages by looking
for a special "X-Email_Notification_To" header in the
body of the original message. The module reads the bounced message
from STDIN (a feed from procmail) and looks for the X-Email_Notification_To:
special header which identifies the undeliverable address. It
then completely removes the records for the undeliverable address
from both "daily" and "weekly" databases.
This module contains a number of routines which are shared by various modules of the system. The module includes routines which:
Synchronization routines facilitate synchronization with other
simultaneous transactions, in order to ensure the stable state
of database and state files, and a serializable schedule. If a
process tries to obtain a lock while another process already holds
it, the process will block until the lock is released by the other
process. Once the process exits the critical section, it releases
the lock in order to give another process a chance to execute.
Program Documentation
Task List
Aleksandr Voskoboynik:
Implemented Notification, Bounced email, and Common modules, as
well as synchronization. Developed database layout. Created the
installation script.
Yuliya Averbukh:
Implemented Subscribe and Unsubscribe modules
Copied from DevEdge Online - JavaScript:
FormChek.js library
References