Serge Shamis
Columbia University
shamis@ms.com
212-762-2139
In the modern world, there is a growing need for integration of the Internet services with the "old", accepted technologies of everyday life. The AudioMail system makes one step in that direction by trying to integrate the regular phone voice-mail and Internet e-mail. AudioMail is a pluggable component that provides an interface for reading a user's e-mail over the phone. Just like a regular voice-mail service, its user input consists of dual-tone multifrequency (DTMF) signals, while its output is e-mail messages read aloud.
Let us take a real-life example. Suppose you are going on vacation to Spain for two weeks. Even though vacations are not meant for this, you would like to check your messages at work periodically, just to see if there is anything urgent to which you might have to respond. Of course, your company has a free dial-in number in Spain which you can call to check your voice mail. But this is the 90's, the Internet Age. Most messages you now receive are e-mail, not voice. Well, you can take a laptop with you, since your company also has a dial-in modem pool where you can connect to check e-mail, but that means that you have to drag along a laptop for the sole purpose of reading your messages. What do you do?
One solution is a voice e-mail system. The idea is that you can check your e-mail over the phone, much like you would check your voice mail. The e-mail messages are simply read to you aloud. The AudioMail system provides the core software engine needed for a voice e-mail system. The user input in AudioMail consists of dual-tone multifrequency (DTMF) signals, which are the sounds generated by buttons of touch-tone phones. The output of the system is a run-time generated voiced rendition of e-mail messages. The AudioMail system can be used in combination with a telephone gateway to provide a complete voice e-mail service. Another application of AudioMail is as an alternative e-mail interface for blind and vision-impaired users. A simple pocket tone dialer that generates DTMF signals can be used as the input device in this case.
Note: Due to the current unavailability of an analog phone line interface (it is currently in use by another group), AudioMail does not attempt integration with an actual telephone gateway, but that can be easily achieved once the line becomes available.
Similar projects are concurrently being done by several other people at Columbia University. Two other teams -- one including Jack Hsu and Jeff Stutz and the other including Jeremy Blumenfeld and Miriam Tauil -- are also working on "e-mail by phone" systems. Francesco Caruso and Xin Jin are working on related projects which provide Web pages by phone.
The AudioMail system is made up of several components:
DTMFDetector.H DTMFDetector.C tabxlaw.h tabxlaw.c
This is a DTMF recognition component, based on Oertel DTMF detection functions, which are part of their Call Server software.
MailBox.H MailBox.C MailMessage.H MailMessage.C
This class is responsible for parsing the user's e-mail and creating an internal data structure to contain relevant headers and bodies of all of the e-mail messages. In the current implementation, it assumes there is a mail spool file for the user, as identified by the environment variable MAIL
. Providing that file if one does not already exist would be external to the program. This could be achieved using some available software, such as movemail
or fetchmail
utilities, and would provide more flexibility for the system.
AudioMail.H AudioMail.C
This class contains the "main loop" of the program. It integrates all the pieces together and provides the user interface. The AudioMail system uses the TTS (Text-To-Speech) software from Bell Labs for reading the mail messages aloud -- that is, for generating audio output from a text buffer. TTS is server-based: a TTS daemon, which has to run on the same or a different machine, is responsible for the text-to-speech conversion. The AudioMail class uses the TTSC (TTS Client) Library API to communicates with the TTS daemon.
main.c
The main program is very simple -- it simply creates an object of class AudioMail, which is then responsible for all the rest.
All source code is written in C++. The system contains about 1100 lines of code.
The source code for the software can be found at the following location: http://www.cs.columbia.edu/~serge/E6998-03/src/.
A C++ compiler with the Standard C++ Library and Standard Template Library (STL). The system has been tested with g++ compiler, but it should work with other C++ compilers as well.
A Makefile is provided with the source code files. Simply run make
to build the executable.
Solaris (SunOS 5.5.1) workstation with audio input/output capabilities (a microphone and a speaker).
Before running AudioMail, the TTS daemon has to be started by running startTTS
script. A TTS daemon can be running on the same machine as AudioMail or on a different machine. If AudioMail is using a TTS daemon running on a different machine, it has to set the environment variable TTS_SERVER
in order to connect to that daemon.
The executable file that is produced is called audioMail
. It takes no command-line parameters.
Both input and output are audio-based. Any DTMF signal device, such as a telephone keypad or a tone dialer, can be used for input. The user is presented with options to read a message, go to the next message, go to the previous message, read the header of the current message, cancel reading, read all headers, etc. "Online help", in the form of audio instructions (the list of available options) can be accessed at any point in time.
>
' characters, indicating that it is someone's original message. TTS reads these characters as "greater than" signs, which can be quite irritating to the user. Some customization would be helpful -- so that the indented text can be declared as the original message.All work on this project has been done by Serge Shamis (shamis@ms.com).
The AudioMail system as presented here is only a simple prototype. Possible future enhancements and extensions might include:
Bell Labs Text-To-Speech System
Columbia CS Department's telephone gateway